Skip to content

2.1.456

Choose a tag to compare

@hashnadz hashnadz released this 11 Mar 20:25
· 102 commits to dev since this release
898d5b3

🚀 Arthur Engine Release

March 12, 2026

This release delivers a comprehensive UI modernization, enhanced evaluation workflows, and improved agent task management alongside critical security updates and performance optimizations.


User Experience & Interface Enhancements

Navigation Consolidation

  • Unified all major product areas into streamlined tabbed interfaces, replacing scattered navigation with intuitive single-entry points
  • Consolidated RAG functionality into unified navigation with Notebooks, Experiments, and Configurations tabs
  • Merged Prompt capabilities into single entry point with Prompts, Notebooks, and Experiments tabs
  • Combined Evaluate features into tabbed view with Evals Management, Continuous Evals, and Results
  • Simplified Test section by merging Agentic Experiments and Agentic Notebooks into unified interface
  • Moved global settings (Model Providers, API Keys) from task sidebar to dedicated settings gear menu

Dark Mode & Theme Improvements

  • Fixed dark mode contrast issues across all UI components and standardized table styling consistency
  • Replaced all Tailwind color classes with MUI theme colors for automatic dark mode support
  • Converted native HTML elements to MUI components for better accessibility and consistent theming

The navigation redesign significantly reduces cognitive load while the theme improvements ensure a polished experience across all viewing modes.


Evaluation & Experiment Enhancements

Continuous Evaluation Workflows

  • Added visual span selector for continuous eval creation, allowing users to select data directly from trace viewer instead of manual typing
  • Introduced inline eval creation from trace viewer with side-by-side span inspection
  • Added evaluate traces modal accessible directly from trace overview with streamlined creation flow
  • Enabled submission of continuous evaluations without requiring description field
  • Added notification system to prompt users for review when evaluation versions are upgraded

Experiment Management

  • Improved experiment loading state derivation from experiment and test case status
  • Added clickable trace ID links in agentic experiment test cases that open trace viewer in new tab
  • Enhanced prompt experiment stability by converting to synchronous execution mode

These improvements streamline the evaluation creation process and provide better visibility into experiment progress.


Agent Task Management & Monitoring

Task Organization & Discovery

  • Added comprehensive filter, sort, and visibility controls to All Tasks page with activity window filters
  • Implemented task archival and unarchival capabilities with proper rule and metrics handling
  • Enhanced task metadata with enriched agent information including tools, sub-agents, models, and infrastructure
  • Added global polling system for agent tasks with proper duplicate execution prevention
  • Improved trace count accuracy in agent discovery with synchronous execution support

Task Interface Improvements

  • Merged task details into overview page as modal dialog for streamlined task management
  • Standardized page headers, action buttons, and empty states across all task views
  • Updated task navigation to show actual task names and improved subtitle copy

Agent task management is now more efficient with better filtering, archival capabilities, and comprehensive metadata visibility.


Data Management & Analysis

Dataset & Transform Operations

  • Enhanced dataset search to query full dataset instead of only current page results
  • Added wildcard transform support in both UI and backend with visual configuration
  • Implemented recursive search for span selector with expanded attribute matching
  • Prevented transform deletion when dependent entities exist with clear user messaging
  • Added confirmation messages after creating dataset transforms from trace viewer

Trace & Span Analysis

  • Made traces table sort arrows functional across traces, spans, sessions, and users tables
  • Fixed trace count deduplication when filters are active for accurate totals
  • Displayed skipped evaluations in gray in trace viewer to distinguish from failures
  • Enhanced span dataset addition with expanded search capabilities across multiple attributes

Data analysis workflows are now more intuitive with functional sorting, accurate counts, and better visual indicators.


Infrastructure & Performance

Security Updates

  • Updated pypdf to address multiple security vulnerabilities including infinite loop and memory exhaustion attacks
  • Updated NLTK to patch critical vulnerability in downloader component
  • Updated Flask to latest security release

Configuration & Optimization

  • Added thread pool configuration with proper defaults and environment variable support
  • Fixed GCP span kind storage issues in span_kind column
  • Improved prompt playground to fetch all paginated prompts instead of only first page
  • Enhanced error handling for Anthropic API compatibility requirements

These infrastructure improvements ensure better security posture and more reliable performance across the platform.


Release notes generated by Louisa