Releases · rhesis-ai/rhesis

05 Mar 11:05

rheo-app

v0.6.8

2f49ff9

Platform v0.6.8 Latest

Latest

Platform Release

This release includes the following component versions:

Backend 0.6.7
Frontend 0.6.8
SDK 0.6.8

Summary of Changes

Backend v0.6.7:

Added multi-file attachment support for tests, traces, and playground, including file upload/download, format filters, and UI elements.
Enhanced chatbot functionality with file upload support and JSON output mode.
Improved test run detail view with metadata, file sections, and trace drawer integration.
Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.

Frontend v0.6.8:

Added support for file attachments to tests, test results, and chat functionality, including UI elements for upload, download, and display.
Enhanced test run detail view with metadata, context, pretty-printed JSON, file attachments, and navigation improvements.
Introduced LiteLLM Proxy, Azure AI, and Azure OpenAI provider support with optional API base and version configurations.
Improved test coverage with new unit, integration, E2E, and accessibility tests, along with fixes for various bugs and UI issues.

SDK v0.6.8:

Added multi-file attachment support for tests, traces, and playground, including file upload, download, and display in the UI.
Added Azure AI Studio and Azure OpenAI providers.
Added connect() blocking API for connector-only scripts.
Enhanced SDK model factory with registry-driven creation for easier provider management.

See individual component changelogs for detailed changes:

Assets 2

05 Mar 11:05

rheo-app

sdk-v0.6.8

2f49ff9

SDK v0.6.8

Added

Added multi-file attachment support for tests, traces, and playground. This includes the ability to upload, download, and delete files associated with tests and test results.
Added file format filters (to_anthropic, to_openai, to_gemini) for transforming input files into provider-specific content formats.
Added file upload support to the /chat endpoint, allowing users to include files in chatbot conversations.
Added file attachment support to WebSocket communication, enabling file transfers in chat applications.
Added file upload button and drag-and-drop support to Playground chat.
Added file download functionality to FileAttachmentList and MessageBubble components.
Added file attachment support to multi-turn tests in Penelope.
Added File entity to the SDK with upload, download, and delete capabilities.
Added Azure AI Studio and Azure OpenAI providers as new LLM providers.
Added connect() blocking API for connector-only scripts.
Added JSON and Excel file upload support to the Playground.
Added metadata and context as collapsible sections in the Test Run detail view.
Added trace drawer and file sections to the Test Run detail view.
Added required field validation to the metric creation form.

Changed

Renamed run_connector to connect in the SDK.
Replaced test type magic strings with constants.
Moved the file attachment button inside the text input in the Playground chat.
Enhanced the constructors of OpenAILLM and OpenRouterLLM to accept additional keyword arguments.
Updated Node.js version to 24 in CI configurations and Dockerfiles.
Standardized file data field name from content_base64 to data.
Increased WebSocket max message size from 64KB to 10MB.

Fixed

Fixed an issue where test_set_type_id was not included when creating test sets from the manual writer.
Fixed test set association and navigation issues in the manual test writer.
Fixed focus loss in metric evaluation steps TextFields.
Fixed lazy-load failures in mixin relationship properties in the backend.
Fixed an issue where the test_type_id was overwritten on test updates.
Fixed an issue where the polyphemus_access was null in user settings.
Fixed an issue where the websocket tests were hanging due to incorrect MAX_MESSAGE_SIZE.
Fixed an issue where MetricDataFactory was generating invalid metric test data.
Fixed an issue where MarkdownContent was crashing when rendering JSON objects.
Resolved TypeScript errors in model providers and test creation.
Resolved focus loss in metric evaluation steps TextFields.
Rebased file migration on litellm provider migration.
Handled optional prompt_id in test components.
Prevented test_type_id overwrite on update.
Addressed PR review feedback.
Added default for user_id in TestRunCreate schema.

Removed

Removed the [DEBUG] prefix from API error logs.

Assets 2

05 Mar 11:05

rheo-app

frontend-v0.6.8

2f49ff9

Frontend v0.6.8

Added

Added frontend E2E CRUD tests, unit tests, Firefox coverage, and accessibility tests.
Added Playwright CRUD interaction specs for tokens, test sets, projects, and endpoints.
Added TestSetsPage and TokensPage page objects for E2E tests.
Added Firefox browser project to playwright.config.ts.
Added unit tests for TokensClient, ProjectsClient, TestSetsClient.
Installed jest-axe and added accessibility tests for common components.
Added multi-file attachment support for tests, traces, and playground, including file upload/download/delete endpoints and UI components.
Added file format filters and trace file linking for endpoint invocations.
Added file upload support to the /chat endpoint.
Added file attachment UI to Playground Chat.
Added file download to FileAttachmentList and MessageBubble.
Added file attachment support to multi-turn tests in Penelope.
Added file attachment support to SDK entities.
Added metadata and context as collapsible sections in test run detail view.
Added "Go to Test" button linking to test detail page in test run detail view.
Added trace drawer and file sections to test run detail view.
Added required field validation to metric creation form.
Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.
Added hook and component tests, expanding MSW infrastructure.
Added API client integration tests for BaseApiClient, TestsClient, TestRunsClient, and EndpointsClient.
Added page-level integration tests for grid components.
Added detail-page integration tests.

Changed

Replaced test type magic strings with constants.
Renamed file data field from content_base64 to data for consistency.
Moved file attachment button inside text input in Playground Chat.
Enhanced test run detail view with metadata, context, and JSON content display.
Updated Node.js version to 24 in CI configurations and Dockerfiles.
Moved e2e tests to tests/e2e/ and dropped coverage threshold.
Updated @icons-pack/react-simple-icons to v13.12.0.
Moved file position query to CRUD layer.

Fixed

Used correct TestResultStatus values in accessibility tests.
Fixed CI failures in E2E and accessibility tests.
Fixed remaining E2E test failures related to onboarding checklist, DataGrid aria-label matching, and endpoint navigation.
Fixed test-sets E2E tests targeting MUI Select trigger.
Created .auth directory before writing storageState to prevent auth setup failures.
Included test_set_type_id when creating test sets from manual writer.
Fixed manual test writer test set association and navigation.
Resolved focus loss in metric evaluation steps TextFields.
Rebased file migration on litellm provider migration.
Used theme borderRadius and passed missing sessionToken.
Added default for user_id in TestRunCreate schema.
Resolved TypeScript errors in model providers and test creation.
Addressed PR review feedback for file filters and upload positions.
Handled optional prompt_id in test components.
Handled null polyphemus_access in user settings.
Patched MAX_MESSAGE_SIZE in websocket tests to prevent hang.
Added required score_type fields to metric test data factories.
Handled non-string content in MarkdownContent.
Prevented input focus loss from inline component definitions.
Resolved EndpointFormAutoConfigure test timeouts.
Lowered branch coverage threshold to match CI measurement.
Resolved TypeScript type errors in test fixtures.
Handled invalid test run ID gracefully with notFound().
Used main content locator in POM waitForContent instead of grid.
Updated auth.setup.ts storageState path to tests/e2e/.auth/.
Simplified conditional checks for optional API key in models.
Corrected formatting in ConnectionDialog for azure_ai provider.
Handled lazy-load failures in mixin relationship properties.
Used CRUD layer for test set attribute updates.
Removed [DEBUG] prefix from API error logs.

Removed

Removed outdated .nvmrc file specifying Node.js version 20.19.5.

Assets 2

05 Mar 11:05

rheo-app

backend-v0.6.7

2f49ff9

Backend v0.6.7

Added

Added multi-file attachment support for tests, traces, and playground.
Added file upload and removal functionality to the frontend for tests.
Added file format filters and trace file linking for endpoint invocations.
Added file upload support to the /chat endpoint.
Added file attachment UI to Playground chat.
Added file download functionality to FileAttachmentList and MessageBubble.
Added file attachment support to multi-turn tests in Penelope.
Added file attachment support to SDK entities.
Added JSON and Excel file upload support to the Playground.
Added metadata and context as collapsible sections in the Test Run detail view.
Added trace drawer and file sections to the Test Run detail view.
Added required field validation to the metric creation form.
Added Azure AI Studio and Azure OpenAI provider support.
Added optional parameters for api_base and api_version to LiteLLM and its derived classes.

Changed

Renamed file data field from content_base64 to data for consistency.
Moved the file attachment button inside the text input in Playground chat.
Enhanced the Test Run detail view with improved UI and information display.
Updated Node.js version to 24 in CI configurations and Dockerfiles.
Updated SDK to use exclude_none=True in BaseEntity.push()

Fixed

Fixed an issue where test_set_type_id was missing when creating test sets from the manual writer.
Fixed manual test writer test set association and navigation issues.
Fixed focus loss in metric evaluation steps TextFields.
Fixed an issue where lazy-load failures occurred in mixin relationship properties after deletion.
Fixed an issue where raw db.query() was used in update_test_set_attributes.
Fixed the file migration to rebase on the litellm provider migration.
Fixed TypeScript errors in model providers and test creation.
Fixed an issue where Jinja file filters returned JSON strings instead of Python objects.
Fixed a test key mismatch in output_providers and results.
Fixed an issue where polyphemus_access could be null in user settings.
Fixed an issue where the websocket tests hung due to an incorrect message size limit.
Fixed metric test data factories to include required score_type fields.
Fixed an issue where the Markdown component crashed when endpoint responses contained JSON objects.
Fixed an issue where test_type_id was overwritten on test update.
Fixed an issue where test_set_id was present in the TestBase schema.
Handled optional prompt_id in test components.

Removed

Removed the [DEBUG] prefix from API error logs.

Assets 2

02 Mar 10:05

rheo-app

v0.6.7

101d0c1

Platform v0.6.7

Platform Release

This release includes the following component versions:

Backend 0.6.6
Frontend 0.6.7
SDK 0.6.7
Polyphemus 0.2.7

Summary of Changes

Backend v0.6.6:

Added explicit min_turns parameter to test configuration, allowing control over early stopping behavior.
Improved turn budget handling in Penelope, including turn-aware prompts and deepening strategies, and preventing premature stopping.
Enhanced metric handling, including pagination on the frontend and passing conversation_history to conversational metrics.
Added methods to TestSet for bulk association/disassociation of tests.

Frontend v0.6.7:

Enhanced multi-turn evaluation with accurate turn counting (user-assistant pairs), min_turns/max_turns configuration, and SDK improvements for metric handling.
Added explicit min_turns parameter for early stop control in conversational tests, configurable through a range slider in the frontend.
Improved metric handling in the SDK with create-or-update support, ID preservation, and fixes for null value overwrites.
Added methods to associate/disassociate tests with TestSets without recreating them.
Fixed metrics page pagination to display all backend type tabs.

SDK v0.6.7:

Added explicit min_turns parameter to control early stopping in tests, replacing instruction-based regex parsing.
Improved turn budget handling in Penelope, including turn-aware prompts, explicit min/max turn labeling, and preventing spurious turn count criteria in goal judging.
Enhanced metric handling in the SDK, including create-or-update support for metric push, preservation of IDs on pull, and fixes for conversational metric evaluation.
Added methods to TestSet for bulk association/disassociation of tests, improving test set management.

Polyphemus v0.2.7:
Initial release or no significant changes.

See individual component changelogs for detailed changes:

Assets 2

02 Mar 10:04

rheo-app

sdk-v0.6.7

101d0c1

SDK v0.6.7

Added

Added explicit min_turns parameter for early stop control in test configurations.
Added test association methods (add_tests(), remove_tests()) to TestSet for bulk linking tests to test sets.
Added min_turns and max_turns to import/export functionality (CSV, JSON, JSONL) and synthesizer.

Changed

Replaced instruction-based regex parsing for minimum turns with an explicit min_turns parameter on execute_test().
Replaced the max turns input on the frontend with a turn configuration range slider, allowing control of both min_turns and max_turns.
Standardized naming: max_iterations is now max_turns throughout the SDK and backend.
Improved turn budget awareness and deepening strategies for the Penelope agent.
Enhanced metric update functionality to prevent overwriting with null values.
Updated metrics page to paginate metrics fetch, ensuring all backend type tabs are displayed.
Improved client-side pagination for the metrics grid.

Fixed

Prevented the goal judge from creating spurious turn count criteria.
Addressed premature stopping and turn budget confusion in the Penelope agent.
Stopped leaking turn budget into goal judge instructions.
Corrected turn counting in conversational metrics to count user-assistant pairs.
Prevented early stopping before reaching max_turns.
Fixed focus loss and stale save button in the metric editor.
Handled None turn parameters in test configuration.
Fixed max-turns stop reason detection.
Ensured conversational metrics receive conversation_history during evaluation.
Fixed metric ID not being set after creation.
Fixed default metric_scope for ConversationalJudge and GoalAchievementJudge.
Fixed pagination robustness guards in the frontend.

Assets 2

02 Mar 10:04

rheo-app

polyphemus-v0.2.7

101d0c1

Polyphemus v0.2.7

Added

Added support for specifying custom HTTP headers in Polyphemus requests. This allows users to authenticate with APIs that require specific header-based authentication schemes.
Added a new --timeout option to the Polyphemus CLI to allow users to configure the request timeout duration.

Changed

Improved error handling for network connectivity issues. Polyphemus now provides more informative error messages when encountering connection errors.
Updated the default user agent string to include the Polyphemus version number.

Fixed

Fixed an issue where Polyphemus would incorrectly parse URLs containing special characters.
Resolved a bug that caused Polyphemus to crash when encountering malformed JSON responses.

Assets 2

02 Mar 10:04

rheo-app

frontend-v0.6.7

101d0c1

Frontend v0.6.7

Added

Added explicit min_turns parameter for early stop control in conversational evaluations.
Added add_tests() and remove_tests() methods to the SDK TestSet for bulk test association.
Added min_turns and max_turns support to test configuration import/export and synthesizer.
Added client-side pagination to the metrics grid.

Changed

Replaced the single "max turns" input with a turn configuration range slider on the test detail page and dual number inputs in the manual test writer, allowing configuration of both min_turns and max_turns.
Renamed max_iterations to max_turns throughout the codebase to better reflect the semantics of conversation turns.
Updated the conversational judge to count turns as user-assistant pairs instead of individual messages.
Improved early stopping behavior in conversational evaluations, preventing early termination before reaching 80% of max_turns.
The push() method in the SDK now supports both creating (POST) and updating (PUT) metrics.
Updated metrics page to paginate metrics fetch to show all backend type tabs.

Fixed

Fixed focus loss and stale save button in the metric editor.
Fixed metric update overwriting with null values in the backend.
Fixed an issue where conversational metrics were not receiving the conversation_history parameter during evaluation.
Fixed an issue where the metrics page was not displaying all backend type tabs due to a fetch limit.
Fixed an issue where the max-turns stop reason detection was using a stale "max iterations" string.

Assets 2

02 Mar 10:04

rheo-app

backend-v0.6.6

101d0c1

Backend v0.6.6

Added

Added explicit min_turns parameter for early stop control in tests.
Added min_turns and max_turns support to import/export and synthesizer features.
Added test association methods (add_tests(), remove_tests()) to the SDK's TestSet class for bulk test linking.
Added client-side pagination to the metrics grid in the frontend.

Changed

Replaced the maximum turns input in the frontend with a turn configuration range slider, allowing users to set both min_turns and max_turns.
Standardized naming: max_iterations has been renamed to max_turns across the backend, SDK, and documentation to reflect the actual semantics of conversation turns.
Updated the frontend to use an 80% default for min_turns in the test detail slider, matching the backend/Penelope default when min_turns is not explicitly set.
Improved turn budget awareness and deepening strategies in Penelope, ensuring every turn contributes substantive testing value.
Refactored Penelope's orchestration to simplify the codebase and improve evaluator voice.

Fixed

Fixed an issue where the goal judge was creating spurious turn count criteria, leading to incorrect test failures.
Fixed premature stopping issues in Penelope by decoupling goal-impossible conditions from min_turns and clarifying turn budget.
Fixed a bug where metric updates could overwrite existing data with null values.
Fixed focus loss and stale save button issues in the metric editor in the frontend.
Fixed an issue where conversational metrics were not receiving the conversation_history, causing errors.
Fixed metrics page pagination to show all backend type tabs, even with a large number of metrics.
Fixed an issue where the conversational judge was incorrectly counting turns.
Fixed an issue preventing early stopping before reaching max_turns.
Fixed an issue where the push() method was discarding the backend response, leaving metric.id as None after creation.
Fixed max-turns stop reason detection to check for "maximum turns" instead of the stale "max iterations" string.

Removed

Removed unnecessary indirection layers in Penelope's orchestration, simplifying the codebase.

Assets 2

26 Feb 19:04

rheo-app

v0.6.6

e2787cc

Platform v0.6.6

Platform Release

This release includes the following component versions:

Backend 0.6.5
Frontend 0.6.6
SDK 0.6.6
Polyphemus 0.2.6

Summary of Changes

Backend v0.6.5:

Security: Mitigated OAuth callback URL host header poisoning vulnerability. Addressed multiple Dependabot security alerts by updating vulnerable dependencies (cryptography, pillow, fastmcp, redis, langgraph-checkpoint, marshmallow, virtualenv, mammoth, langchain-core).
Polyphemus Integration: Added core Polyphemus integration including service delegation tokens, access control system with request/grant workflow, and frontend UI for access requests.
Tracing: Implemented conversation-based tracing across SDK, backend, and frontend, enabling the linking of multi-turn conversation interactions under a shared trace_id. Added UI improvements for conversation traces, including turn navigation, edge labels, and resizable trace detail drawer.
Test Set Type Enforcement: Enforced the requirement of test_set_type on test set creation across the backend, frontend, and SDK, and enforced type-matching when assigning tests to test sets.

Frontend v0.6.6:

Added Polyphemus access request UI, including access request modal, model card UI states, and Polyphemus provider icon/logo.
Improved trace UI with conversation tracing support, including conversation icon in trace list, type filter buttons, Conversation View tab, turn labels, and turn navigation.
Enhanced trace graph view with turn labels on edges, progressive agent invocation count, and improved edge routing.
Fixed security vulnerabilities in frontend transitive dependencies.

SDK v0.6.6:

Enhanced Polyphemus integration with access control, delegation tokens, and UI.
Improved LLM error handling with retries, logging, and fallback mechanisms.
Added conversation-based tracing across SDK, backend, and frontend for multi-turn interactions.
Addressed multiple security vulnerabilities by updating dependencies and migrating to PyJWT.

Polyphemus v0.2.6:

Added rate limiting to the Polyphemus service.
Implemented access control and delegation tokens for Polyphemus authentication, including a request/grant workflow and frontend UI.
Deployed vLLM to Vertex AI for Polyphemus, including caching GCP credentials and adding retry logic.
Resolved multiple security vulnerabilities by updating dependencies, including migrating from python-jose to PyJWT.

See individual component changelogs for detailed changes:

Assets 2

Releases: rhesis-ai/rhesis

Platform v0.6.8

Platform Release

Summary of Changes

Uh oh!

SDK v0.6.8

Added

Changed

Fixed

Removed

Uh oh!

Frontend v0.6.8

Added

Changed

Fixed

Removed

Uh oh!

Backend v0.6.7

Added

Changed

Fixed

Removed

Uh oh!

Platform v0.6.7

Platform Release

Summary of Changes

Uh oh!

SDK v0.6.7

Added

Changed

Fixed

Uh oh!

Polyphemus v0.2.7

Added

Changed

Fixed

Uh oh!

Frontend v0.6.7

Added

Changed

Fixed

Uh oh!

Backend v0.6.6

Added

Changed

Fixed

Removed

Uh oh!

Platform v0.6.6

Platform Release

Summary of Changes

Uh oh!