2.1.456
🚀 Arthur Engine Release
March 12, 2026
This release delivers a comprehensive UI modernization, enhanced evaluation workflows, and improved agent task management alongside critical security updates and performance optimizations.
User Experience & Interface Enhancements
Navigation Consolidation
- Unified all major product areas into streamlined tabbed interfaces, replacing scattered navigation with intuitive single-entry points
- Consolidated RAG functionality into unified navigation with Notebooks, Experiments, and Configurations tabs
- Merged Prompt capabilities into single entry point with Prompts, Notebooks, and Experiments tabs
- Combined Evaluate features into tabbed view with Evals Management, Continuous Evals, and Results
- Simplified Test section by merging Agentic Experiments and Agentic Notebooks into unified interface
- Moved global settings (Model Providers, API Keys) from task sidebar to dedicated settings gear menu
Dark Mode & Theme Improvements
- Fixed dark mode contrast issues across all UI components and standardized table styling consistency
- Replaced all Tailwind color classes with MUI theme colors for automatic dark mode support
- Converted native HTML elements to MUI components for better accessibility and consistent theming
The navigation redesign significantly reduces cognitive load while the theme improvements ensure a polished experience across all viewing modes.
Evaluation & Experiment Enhancements
Continuous Evaluation Workflows
- Added visual span selector for continuous eval creation, allowing users to select data directly from trace viewer instead of manual typing
- Introduced inline eval creation from trace viewer with side-by-side span inspection
- Added evaluate traces modal accessible directly from trace overview with streamlined creation flow
- Enabled submission of continuous evaluations without requiring description field
- Added notification system to prompt users for review when evaluation versions are upgraded
Experiment Management
- Improved experiment loading state derivation from experiment and test case status
- Added clickable trace ID links in agentic experiment test cases that open trace viewer in new tab
- Enhanced prompt experiment stability by converting to synchronous execution mode
These improvements streamline the evaluation creation process and provide better visibility into experiment progress.
Agent Task Management & Monitoring
Task Organization & Discovery
- Added comprehensive filter, sort, and visibility controls to All Tasks page with activity window filters
- Implemented task archival and unarchival capabilities with proper rule and metrics handling
- Enhanced task metadata with enriched agent information including tools, sub-agents, models, and infrastructure
- Added global polling system for agent tasks with proper duplicate execution prevention
- Improved trace count accuracy in agent discovery with synchronous execution support
Task Interface Improvements
- Merged task details into overview page as modal dialog for streamlined task management
- Standardized page headers, action buttons, and empty states across all task views
- Updated task navigation to show actual task names and improved subtitle copy
Agent task management is now more efficient with better filtering, archival capabilities, and comprehensive metadata visibility.
Data Management & Analysis
Dataset & Transform Operations
- Enhanced dataset search to query full dataset instead of only current page results
- Added wildcard transform support in both UI and backend with visual configuration
- Implemented recursive search for span selector with expanded attribute matching
- Prevented transform deletion when dependent entities exist with clear user messaging
- Added confirmation messages after creating dataset transforms from trace viewer
Trace & Span Analysis
- Made traces table sort arrows functional across traces, spans, sessions, and users tables
- Fixed trace count deduplication when filters are active for accurate totals
- Displayed skipped evaluations in gray in trace viewer to distinguish from failures
- Enhanced span dataset addition with expanded search capabilities across multiple attributes
Data analysis workflows are now more intuitive with functional sorting, accurate counts, and better visual indicators.
Infrastructure & Performance
Security Updates
- Updated pypdf to address multiple security vulnerabilities including infinite loop and memory exhaustion attacks
- Updated NLTK to patch critical vulnerability in downloader component
- Updated Flask to latest security release
Configuration & Optimization
- Added thread pool configuration with proper defaults and environment variable support
- Fixed GCP span kind storage issues in span_kind column
- Improved prompt playground to fetch all paginated prompts instead of only first page
- Enhanced error handling for Anthropic API compatibility requirements
These infrastructure improvements ensure better security posture and more reliable performance across the platform.
Release notes generated by Louisa