🚀 Arthur Engine Release

March 12, 2026

This release delivers a comprehensive UI modernization, enhanced evaluation workflows, and improved agent task management alongside critical security updates and performance optimizations.

User Experience & Interface Enhancements

Navigation Consolidation

Unified all major product areas into streamlined tabbed interfaces, replacing scattered navigation with intuitive single-entry points
Consolidated RAG functionality into unified navigation with Notebooks, Experiments, and Configurations tabs
Merged Prompt capabilities into single entry point with Prompts, Notebooks, and Experiments tabs
Combined Evaluate features into tabbed view with Evals Management, Continuous Evals, and Results
Simplified Test section by merging Agentic Experiments and Agentic Notebooks into unified interface
Moved global settings (Model Providers, API Keys) from task sidebar to dedicated settings gear menu

Dark Mode & Theme Improvements

Fixed dark mode contrast issues across all UI components and standardized table styling consistency
Replaced all Tailwind color classes with MUI theme colors for automatic dark mode support
Converted native HTML elements to MUI components for better accessibility and consistent theming

The navigation redesign significantly reduces cognitive load while the theme improvements ensure a polished experience across all viewing modes.

Evaluation & Experiment Enhancements

Continuous Evaluation Workflows

Added visual span selector for continuous eval creation, allowing users to select data directly from trace viewer instead of manual typing
Introduced inline eval creation from trace viewer with side-by-side span inspection
Added evaluate traces modal accessible directly from trace overview with streamlined creation flow
Enabled submission of continuous evaluations without requiring description field
Added notification system to prompt users for review when evaluation versions are upgraded

Experiment Management

Improved experiment loading state derivation from experiment and test case status
Added clickable trace ID links in agentic experiment test cases that open trace viewer in new tab
Enhanced prompt experiment stability by converting to synchronous execution mode

These improvements streamline the evaluation creation process and provide better visibility into experiment progress.

Agent Task Management & Monitoring

Task Organization & Discovery

Added comprehensive filter, sort, and visibility controls to All Tasks page with activity window filters
Implemented task archival and unarchival capabilities with proper rule and metrics handling
Enhanced task metadata with enriched agent information including tools, sub-agents, models, and infrastructure
Added global polling system for agent tasks with proper duplicate execution prevention
Improved trace count accuracy in agent discovery with synchronous execution support

Task Interface Improvements

Merged task details into overview page as modal dialog for streamlined task management
Standardized page headers, action buttons, and empty states across all task views
Updated task navigation to show actual task names and improved subtitle copy

Agent task management is now more efficient with better filtering, archival capabilities, and comprehensive metadata visibility.

Data Management & Analysis

Dataset & Transform Operations

Enhanced dataset search to query full dataset instead of only current page results
Added wildcard transform support in both UI and backend with visual configuration
Implemented recursive search for span selector with expanded attribute matching
Prevented transform deletion when dependent entities exist with clear user messaging
Added confirmation messages after creating dataset transforms from trace viewer

Trace & Span Analysis

Made traces table sort arrows functional across traces, spans, sessions, and users tables
Fixed trace count deduplication when filters are active for accurate totals
Displayed skipped evaluations in gray in trace viewer to distinguish from failures
Enhanced span dataset addition with expanded search capabilities across multiple attributes

Data analysis workflows are now more intuitive with functional sorting, accurate counts, and better visual indicators.

Infrastructure & Performance

Security Updates

Updated pypdf to address multiple security vulnerabilities including infinite loop and memory exhaustion attacks
Updated NLTK to patch critical vulnerability in downloader component
Updated Flask to latest security release

Configuration & Optimization

Added thread pool configuration with proper defaults and environment variable support
Fixed GCP span kind storage issues in span_kind column
Improved prompt playground to fetch all paginated prompts instead of only first page
Enhanced error handling for Anthropic API compatibility requirements

These infrastructure improvements ensure better security posture and more reliable performance across the platform.

Release notes generated by Louisa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.1.456

Choose a tag to compare

Sorry, something went wrong.