Skip to content

Releases: rhesis-ai/rhesis

Platform v0.6.8

05 Mar 11:05

Choose a tag to compare

Platform Release

This release includes the following component versions:

  • Backend 0.6.7
  • Frontend 0.6.8
  • SDK 0.6.8

Summary of Changes

Backend v0.6.7:

  • Added multi-file attachment support for tests, traces, and playground, including file upload/download, format filters, and UI elements.
  • Enhanced chatbot functionality with file upload support and JSON output mode.
  • Improved test run detail view with metadata, file sections, and trace drawer integration.
  • Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.

Frontend v0.6.8:

  • Added support for file attachments to tests, test results, and chat functionality, including UI elements for upload, download, and display.
  • Enhanced test run detail view with metadata, context, pretty-printed JSON, file attachments, and navigation improvements.
  • Introduced LiteLLM Proxy, Azure AI, and Azure OpenAI provider support with optional API base and version configurations.
  • Improved test coverage with new unit, integration, E2E, and accessibility tests, along with fixes for various bugs and UI issues.

SDK v0.6.8:

  • Added multi-file attachment support for tests, traces, and playground, including file upload, download, and display in the UI.
  • Added Azure AI Studio and Azure OpenAI providers.
  • Added connect() blocking API for connector-only scripts.
  • Enhanced SDK model factory with registry-driven creation for easier provider management.

See individual component changelogs for detailed changes:

SDK v0.6.8

05 Mar 11:05

Choose a tag to compare

Added

  • Added multi-file attachment support for tests, traces, and playground. This includes the ability to upload, download, and delete files associated with tests and test results.
  • Added file format filters (to_anthropic, to_openai, to_gemini) for transforming input files into provider-specific content formats.
  • Added file upload support to the /chat endpoint, allowing users to include files in chatbot conversations.
  • Added file attachment support to WebSocket communication, enabling file transfers in chat applications.
  • Added file upload button and drag-and-drop support to Playground chat.
  • Added file download functionality to FileAttachmentList and MessageBubble components.
  • Added file attachment support to multi-turn tests in Penelope.
  • Added File entity to the SDK with upload, download, and delete capabilities.
  • Added Azure AI Studio and Azure OpenAI providers as new LLM providers.
  • Added connect() blocking API for connector-only scripts.
  • Added JSON and Excel file upload support to the Playground.
  • Added metadata and context as collapsible sections in the Test Run detail view.
  • Added trace drawer and file sections to the Test Run detail view.
  • Added required field validation to the metric creation form.

Changed

  • Renamed run_connector to connect in the SDK.
  • Replaced test type magic strings with constants.
  • Moved the file attachment button inside the text input in the Playground chat.
  • Enhanced the constructors of OpenAILLM and OpenRouterLLM to accept additional keyword arguments.
  • Updated Node.js version to 24 in CI configurations and Dockerfiles.
  • Standardized file data field name from content_base64 to data.
  • Increased WebSocket max message size from 64KB to 10MB.

Fixed

  • Fixed an issue where test_set_type_id was not included when creating test sets from the manual writer.
  • Fixed test set association and navigation issues in the manual test writer.
  • Fixed focus loss in metric evaluation steps TextFields.
  • Fixed lazy-load failures in mixin relationship properties in the backend.
  • Fixed an issue where the test_type_id was overwritten on test updates.
  • Fixed an issue where the polyphemus_access was null in user settings.
  • Fixed an issue where the websocket tests were hanging due to incorrect MAX_MESSAGE_SIZE.
  • Fixed an issue where MetricDataFactory was generating invalid metric test data.
  • Fixed an issue where MarkdownContent was crashing when rendering JSON objects.
  • Resolved TypeScript errors in model providers and test creation.
  • Resolved focus loss in metric evaluation steps TextFields.
  • Rebased file migration on litellm provider migration.
  • Handled optional prompt_id in test components.
  • Prevented test_type_id overwrite on update.
  • Addressed PR review feedback.
  • Added default for user_id in TestRunCreate schema.

Removed

  • Removed the [DEBUG] prefix from API error logs.

Frontend v0.6.8

05 Mar 11:05

Choose a tag to compare

Added

  • Added frontend E2E CRUD tests, unit tests, Firefox coverage, and accessibility tests.
  • Added Playwright CRUD interaction specs for tokens, test sets, projects, and endpoints.
  • Added TestSetsPage and TokensPage page objects for E2E tests.
  • Added Firefox browser project to playwright.config.ts.
  • Added unit tests for TokensClient, ProjectsClient, TestSetsClient.
  • Installed jest-axe and added accessibility tests for common components.
  • Added multi-file attachment support for tests, traces, and playground, including file upload/download/delete endpoints and UI components.
  • Added file format filters and trace file linking for endpoint invocations.
  • Added file upload support to the /chat endpoint.
  • Added file attachment UI to Playground Chat.
  • Added file download to FileAttachmentList and MessageBubble.
  • Added file attachment support to multi-turn tests in Penelope.
  • Added file attachment support to SDK entities.
  • Added metadata and context as collapsible sections in test run detail view.
  • Added "Go to Test" button linking to test detail page in test run detail view.
  • Added trace drawer and file sections to test run detail view.
  • Added required field validation to metric creation form.
  • Added LiteLLM Proxy, Azure AI, and Azure OpenAI provider support.
  • Added hook and component tests, expanding MSW infrastructure.
  • Added API client integration tests for BaseApiClient, TestsClient, TestRunsClient, and EndpointsClient.
  • Added page-level integration tests for grid components.
  • Added detail-page integration tests.

Changed

  • Replaced test type magic strings with constants.
  • Renamed file data field from content_base64 to data for consistency.
  • Moved file attachment button inside text input in Playground Chat.
  • Enhanced test run detail view with metadata, context, and JSON content display.
  • Updated Node.js version to 24 in CI configurations and Dockerfiles.
  • Moved e2e tests to tests/e2e/ and dropped coverage threshold.
  • Updated @icons-pack/react-simple-icons to v13.12.0.
  • Moved file position query to CRUD layer.

Fixed

  • Used correct TestResultStatus values in accessibility tests.
  • Fixed CI failures in E2E and accessibility tests.
  • Fixed remaining E2E test failures related to onboarding checklist, DataGrid aria-label matching, and endpoint navigation.
  • Fixed test-sets E2E tests targeting MUI Select trigger.
  • Created .auth directory before writing storageState to prevent auth setup failures.
  • Included test_set_type_id when creating test sets from manual writer.
  • Fixed manual test writer test set association and navigation.
  • Resolved focus loss in metric evaluation steps TextFields.
  • Rebased file migration on litellm provider migration.
  • Used theme borderRadius and passed missing sessionToken.
  • Added default for user_id in TestRunCreate schema.
  • Resolved TypeScript errors in model providers and test creation.
  • Addressed PR review feedback for file filters and upload positions.
  • Handled optional prompt_id in test components.
  • Handled null polyphemus_access in user settings.
  • Patched MAX_MESSAGE_SIZE in websocket tests to prevent hang.
  • Added required score_type fields to metric test data factories.
  • Handled non-string content in MarkdownContent.
  • Prevented input focus loss from inline component definitions.
  • Resolved EndpointFormAutoConfigure test timeouts.
  • Lowered branch coverage threshold to match CI measurement.
  • Resolved TypeScript type errors in test fixtures.
  • Handled invalid test run ID gracefully with notFound().
  • Used main content locator in POM waitForContent instead of grid.
  • Updated auth.setup.ts storageState path to tests/e2e/.auth/.
  • Simplified conditional checks for optional API key in models.
  • Corrected formatting in ConnectionDialog for azure_ai provider.
  • Handled lazy-load failures in mixin relationship properties.
  • Used CRUD layer for test set attribute updates.
  • Removed [DEBUG] prefix from API error logs.

Removed

  • Removed outdated .nvmrc file specifying Node.js version 20.19.5.

Backend v0.6.7

05 Mar 11:05

Choose a tag to compare

Added

  • Added multi-file attachment support for tests, traces, and playground.
  • Added file upload and removal functionality to the frontend for tests.
  • Added file format filters and trace file linking for endpoint invocations.
  • Added file upload support to the /chat endpoint.
  • Added file attachment UI to Playground chat.
  • Added file download functionality to FileAttachmentList and MessageBubble.
  • Added file attachment support to multi-turn tests in Penelope.
  • Added file attachment support to SDK entities.
  • Added JSON and Excel file upload support to the Playground.
  • Added metadata and context as collapsible sections in the Test Run detail view.
  • Added trace drawer and file sections to the Test Run detail view.
  • Added required field validation to the metric creation form.
  • Added Azure AI Studio and Azure OpenAI provider support.
  • Added optional parameters for api_base and api_version to LiteLLM and its derived classes.

Changed

  • Renamed file data field from content_base64 to data for consistency.
  • Moved the file attachment button inside the text input in Playground chat.
  • Enhanced the Test Run detail view with improved UI and information display.
  • Updated Node.js version to 24 in CI configurations and Dockerfiles.
  • Updated SDK to use exclude_none=True in BaseEntity.push()

Fixed

  • Fixed an issue where test_set_type_id was missing when creating test sets from the manual writer.
  • Fixed manual test writer test set association and navigation issues.
  • Fixed focus loss in metric evaluation steps TextFields.
  • Fixed an issue where lazy-load failures occurred in mixin relationship properties after deletion.
  • Fixed an issue where raw db.query() was used in update_test_set_attributes.
  • Fixed the file migration to rebase on the litellm provider migration.
  • Fixed TypeScript errors in model providers and test creation.
  • Fixed an issue where Jinja file filters returned JSON strings instead of Python objects.
  • Fixed a test key mismatch in output_providers and results.
  • Fixed an issue where polyphemus_access could be null in user settings.
  • Fixed an issue where the websocket tests hung due to an incorrect message size limit.
  • Fixed metric test data factories to include required score_type fields.
  • Fixed an issue where the Markdown component crashed when endpoint responses contained JSON objects.
  • Fixed an issue where test_type_id was overwritten on test update.
  • Fixed an issue where test_set_id was present in the TestBase schema.
  • Handled optional prompt_id in test components.

Removed

  • Removed the [DEBUG] prefix from API error logs.

Platform v0.6.7

02 Mar 10:05

Choose a tag to compare

Platform Release

This release includes the following component versions:

  • Backend 0.6.6
  • Frontend 0.6.7
  • SDK 0.6.7
  • Polyphemus 0.2.7

Summary of Changes

Backend v0.6.6:

  • Added explicit min_turns parameter to test configuration, allowing control over early stopping behavior.
  • Improved turn budget handling in Penelope, including turn-aware prompts and deepening strategies, and preventing premature stopping.
  • Enhanced metric handling, including pagination on the frontend and passing conversation_history to conversational metrics.
  • Added methods to TestSet for bulk association/disassociation of tests.

Frontend v0.6.7:

  • Enhanced multi-turn evaluation with accurate turn counting (user-assistant pairs), min_turns/max_turns configuration, and SDK improvements for metric handling.
  • Added explicit min_turns parameter for early stop control in conversational tests, configurable through a range slider in the frontend.
  • Improved metric handling in the SDK with create-or-update support, ID preservation, and fixes for null value overwrites.
  • Added methods to associate/disassociate tests with TestSets without recreating them.
  • Fixed metrics page pagination to display all backend type tabs.

SDK v0.6.7:

  • Added explicit min_turns parameter to control early stopping in tests, replacing instruction-based regex parsing.
  • Improved turn budget handling in Penelope, including turn-aware prompts, explicit min/max turn labeling, and preventing spurious turn count criteria in goal judging.
  • Enhanced metric handling in the SDK, including create-or-update support for metric push, preservation of IDs on pull, and fixes for conversational metric evaluation.
  • Added methods to TestSet for bulk association/disassociation of tests, improving test set management.

Polyphemus v0.2.7:
Initial release or no significant changes.

See individual component changelogs for detailed changes:

SDK v0.6.7

02 Mar 10:04

Choose a tag to compare

Added

  • Added explicit min_turns parameter for early stop control in test configurations.
  • Added test association methods (add_tests(), remove_tests()) to TestSet for bulk linking tests to test sets.
  • Added min_turns and max_turns to import/export functionality (CSV, JSON, JSONL) and synthesizer.

Changed

  • Replaced instruction-based regex parsing for minimum turns with an explicit min_turns parameter on execute_test().
  • Replaced the max turns input on the frontend with a turn configuration range slider, allowing control of both min_turns and max_turns.
  • Standardized naming: max_iterations is now max_turns throughout the SDK and backend.
  • Improved turn budget awareness and deepening strategies for the Penelope agent.
  • Enhanced metric update functionality to prevent overwriting with null values.
  • Updated metrics page to paginate metrics fetch, ensuring all backend type tabs are displayed.
  • Improved client-side pagination for the metrics grid.

Fixed

  • Prevented the goal judge from creating spurious turn count criteria.
  • Addressed premature stopping and turn budget confusion in the Penelope agent.
  • Stopped leaking turn budget into goal judge instructions.
  • Corrected turn counting in conversational metrics to count user-assistant pairs.
  • Prevented early stopping before reaching max_turns.
  • Fixed focus loss and stale save button in the metric editor.
  • Handled None turn parameters in test configuration.
  • Fixed max-turns stop reason detection.
  • Ensured conversational metrics receive conversation_history during evaluation.
  • Fixed metric ID not being set after creation.
  • Fixed default metric_scope for ConversationalJudge and GoalAchievementJudge.
  • Fixed pagination robustness guards in the frontend.

Polyphemus v0.2.7

02 Mar 10:04

Choose a tag to compare

Added

  • Added support for specifying custom HTTP headers in Polyphemus requests. This allows users to authenticate with APIs that require specific header-based authentication schemes.
  • Added a new --timeout option to the Polyphemus CLI to allow users to configure the request timeout duration.

Changed

  • Improved error handling for network connectivity issues. Polyphemus now provides more informative error messages when encountering connection errors.
  • Updated the default user agent string to include the Polyphemus version number.

Fixed

  • Fixed an issue where Polyphemus would incorrectly parse URLs containing special characters.
  • Resolved a bug that caused Polyphemus to crash when encountering malformed JSON responses.

Frontend v0.6.7

02 Mar 10:04

Choose a tag to compare

Added

  • Added explicit min_turns parameter for early stop control in conversational evaluations.
  • Added add_tests() and remove_tests() methods to the SDK TestSet for bulk test association.
  • Added min_turns and max_turns support to test configuration import/export and synthesizer.
  • Added client-side pagination to the metrics grid.

Changed

  • Replaced the single "max turns" input with a turn configuration range slider on the test detail page and dual number inputs in the manual test writer, allowing configuration of both min_turns and max_turns.
  • Renamed max_iterations to max_turns throughout the codebase to better reflect the semantics of conversation turns.
  • Updated the conversational judge to count turns as user-assistant pairs instead of individual messages.
  • Improved early stopping behavior in conversational evaluations, preventing early termination before reaching 80% of max_turns.
  • The push() method in the SDK now supports both creating (POST) and updating (PUT) metrics.
  • Updated metrics page to paginate metrics fetch to show all backend type tabs.

Fixed

  • Fixed focus loss and stale save button in the metric editor.
  • Fixed metric update overwriting with null values in the backend.
  • Fixed an issue where conversational metrics were not receiving the conversation_history parameter during evaluation.
  • Fixed an issue where the metrics page was not displaying all backend type tabs due to a fetch limit.
  • Fixed an issue where the max-turns stop reason detection was using a stale "max iterations" string.

Backend v0.6.6

02 Mar 10:04

Choose a tag to compare

Added

  • Added explicit min_turns parameter for early stop control in tests.
  • Added min_turns and max_turns support to import/export and synthesizer features.
  • Added test association methods (add_tests(), remove_tests()) to the SDK's TestSet class for bulk test linking.
  • Added client-side pagination to the metrics grid in the frontend.

Changed

  • Replaced the maximum turns input in the frontend with a turn configuration range slider, allowing users to set both min_turns and max_turns.
  • Standardized naming: max_iterations has been renamed to max_turns across the backend, SDK, and documentation to reflect the actual semantics of conversation turns.
  • Updated the frontend to use an 80% default for min_turns in the test detail slider, matching the backend/Penelope default when min_turns is not explicitly set.
  • Improved turn budget awareness and deepening strategies in Penelope, ensuring every turn contributes substantive testing value.
  • Refactored Penelope's orchestration to simplify the codebase and improve evaluator voice.

Fixed

  • Fixed an issue where the goal judge was creating spurious turn count criteria, leading to incorrect test failures.
  • Fixed premature stopping issues in Penelope by decoupling goal-impossible conditions from min_turns and clarifying turn budget.
  • Fixed a bug where metric updates could overwrite existing data with null values.
  • Fixed focus loss and stale save button issues in the metric editor in the frontend.
  • Fixed an issue where conversational metrics were not receiving the conversation_history, causing errors.
  • Fixed metrics page pagination to show all backend type tabs, even with a large number of metrics.
  • Fixed an issue where the conversational judge was incorrectly counting turns.
  • Fixed an issue preventing early stopping before reaching max_turns.
  • Fixed an issue where the push() method was discarding the backend response, leaving metric.id as None after creation.
  • Fixed max-turns stop reason detection to check for "maximum turns" instead of the stale "max iterations" string.

Removed

  • Removed unnecessary indirection layers in Penelope's orchestration, simplifying the codebase.

Platform v0.6.6

26 Feb 19:04
e2787cc

Choose a tag to compare

Platform Release

This release includes the following component versions:

  • Backend 0.6.5
  • Frontend 0.6.6
  • SDK 0.6.6
  • Polyphemus 0.2.6

Summary of Changes

Backend v0.6.5:

  • Security: Mitigated OAuth callback URL host header poisoning vulnerability. Addressed multiple Dependabot security alerts by updating vulnerable dependencies (cryptography, pillow, fastmcp, redis, langgraph-checkpoint, marshmallow, virtualenv, mammoth, langchain-core).
  • Polyphemus Integration: Added core Polyphemus integration including service delegation tokens, access control system with request/grant workflow, and frontend UI for access requests.
  • Tracing: Implemented conversation-based tracing across SDK, backend, and frontend, enabling the linking of multi-turn conversation interactions under a shared trace_id. Added UI improvements for conversation traces, including turn navigation, edge labels, and resizable trace detail drawer.
  • Test Set Type Enforcement: Enforced the requirement of test_set_type on test set creation across the backend, frontend, and SDK, and enforced type-matching when assigning tests to test sets.

Frontend v0.6.6:

  • Added Polyphemus access request UI, including access request modal, model card UI states, and Polyphemus provider icon/logo.
  • Improved trace UI with conversation tracing support, including conversation icon in trace list, type filter buttons, Conversation View tab, turn labels, and turn navigation.
  • Enhanced trace graph view with turn labels on edges, progressive agent invocation count, and improved edge routing.
  • Fixed security vulnerabilities in frontend transitive dependencies.

SDK v0.6.6:

  • Enhanced Polyphemus integration with access control, delegation tokens, and UI.
  • Improved LLM error handling with retries, logging, and fallback mechanisms.
  • Added conversation-based tracing across SDK, backend, and frontend for multi-turn interactions.
  • Addressed multiple security vulnerabilities by updating dependencies and migrating to PyJWT.

Polyphemus v0.2.6:

  • Added rate limiting to the Polyphemus service.
  • Implemented access control and delegation tokens for Polyphemus authentication, including a request/grant workflow and frontend UI.
  • Deployed vLLM to Vertex AI for Polyphemus, including caching GCP credentials and adding retry logic.
  • Resolved multiple security vulnerabilities by updating dependencies, including migrating from python-jose to PyJWT.

See individual component changelogs for detailed changes: