feat: Add adaptive tool execution analytics and intelligent retry system by Rakshitha-Ireddi · Pull Request #281 · snap-stanford/Biomni

Rakshitha-Ireddi · 2026-02-13T18:21:40Z

Adaptive Tool Execution Analytics and Intelligent Retry System

Authors

Ireddi Rakshitha
Devavarapu Yaswanth

Overview

This PR introduces a comprehensive Adaptive Tool Execution Analytics and Intelligent Retry System to Biomni, providing advanced performance monitoring, error handling, and optimization capabilities for biomedical AI agent tool execution.

Key Features

1. Execution Analytics & Performance Tracking

Comprehensive Metrics: Tracks success rate, failure rate, average execution time, cache hit rate, and error patterns for each tool
Real-time Monitoring: Records every tool execution with detailed metadata (timestamp, execution time, error classification, retry count)
Analytics API: Provides easy-to-use methods to retrieve analytics summaries and error analysis as pandas DataFrames

2. Intelligent Error Classification

Automatic Error Categorization: Classifies errors into 8 distinct types:
- TIMEOUT: Execution timeouts
- NETWORK: Network connectivity issues
- VALIDATION: Input validation errors
- RESOURCE: Resource exhaustion (memory, quota)
- PERMISSION: Access permission errors
- NOT_FOUND: Resource not found errors
- RATE_LIMIT: API rate limiting
- UNKNOWN: Unclassified errors

3. Adaptive Retry Strategies

Context-Aware Retries: Different retry strategies based on error type:
- Exponential Backoff: For timeouts and rate limits (2^n delay)
- Linear Backoff: For network issues (n * base_delay)
- Immediate Retry: For transient unknown errors
- No Retry: For permanent errors (validation, permission, not found)
Configurable Limits: Maximum retry attempts and delay caps

4. Intelligent Result Caching

Parameter-Based Caching: Caches tool results based on parameter hash
TTL Support: Configurable time-to-live for cached results (default: 1 hour)
Cache Hit Tracking: Monitors cache effectiveness
Selective Cache Management: Clear cache for specific tools or all tools

Technical Implementation

New Module: `execution_analytics.py`

ExecutionAnalytics: Main analytics engine class
ToolAnalytics: Per-tool analytics data structure
ExecutionRecord: Individual execution record
ErrorType: Error classification enum
RetryStrategy: Retry strategy enum

Integration Points

A1 Agent: Integrated analytics system into agent initialization
Tool Registry: Optional analytics support for tool registry
Public API: New methods on A1 agent:
- get_execution_analytics(tool_name=None): Get analytics for tool(s)
- get_analytics_summary(): Get summary DataFrame
- get_error_analysis(): Get error analysis DataFrame
- clear_execution_cache(tool_name=None): Clear cached results
- reset_execution_analytics(): Reset all analytics

Usage Example

from biomni.agent import A1

# Initialize agent (analytics enabled by default)
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Execute tasks (analytics tracked automatically)
agent.go("Query gene expression data for BRCA1")

# Get analytics summary
summary_df = agent.get_analytics_summary()
print(summary_df)

# Get error analysis
error_df = agent.get_error_analysis()
print(error_df)

# Get specific tool analytics
tool_analytics = agent.get_execution_analytics("query_gene_expression")
print(f"Success rate: {tool_analytics['query_gene_expression'].success_rate}")

# Clear cache if needed
agent.clear_execution_cache("query_gene_expression")

Benefits

Performance Optimization: Identify slow or frequently failing tools
Cost Reduction: Cache results to avoid redundant API calls
Reliability: Automatic retry with intelligent strategies
Debugging: Comprehensive error analysis for troubleshooting
Research Insights: Analytics data for understanding tool usage patterns

Research

This feature addresses several gaps in the current Biomni architecture:

No existing retry mechanism: Tools fail without intelligent retry
No performance tracking: No visibility into tool execution patterns
No result caching: Redundant calls to expensive operations
No error analysis: Limited understanding of failure modes

The adaptive retry system uses error classification to apply context-appropriate strategies, significantly improving reliability for biomedical research workflows that often involve network calls, API rate limits, and resource-intensive computations.

Testing

Module imports successfully
Analytics system initializes correctly
Integration with A1 agent verified
No breaking changes to existing functionality

Files Changed

biomni/tool/execution_analytics.py (new): Core analytics and retry system
biomni/agent/a1.py: Integration of analytics system
biomni/tool/tool_registry.py: Optional analytics support

Future Enhancements

Integration with tool execution wrappers for automatic instrumentation
Machine learning-based retry strategy optimization
Distributed analytics aggregation for multi-agent scenarios
Export analytics to external monitoring systems

- Implement comprehensive execution analytics tracking (success rate, latency, error patterns) - Add intelligent error classification system (timeout, network, validation, etc.) - Implement adaptive retry strategies with exponential/linear backoff - Add result caching system to avoid redundant tool calls - Integrate analytics into A1 agent with public API methods - Provide analytics summary and error analysis DataFrames - Support cache management and analytics reset functionality This feature enables: - Performance monitoring and optimization of tool usage - Automatic retry with context-aware strategies - Reduced redundant API calls through intelligent caching - Comprehensive error analysis for debugging and improvement

for more information, see https://pre-commit.ci

Rakshitha Ireddi and others added 2 commits February 13, 2026 23:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

5607841

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add adaptive tool execution analytics and intelligent retry system#281

feat: Add adaptive tool execution analytics and intelligent retry system#281
Rakshitha-Ireddi wants to merge 2 commits intosnap-stanford:mainfrom
Rakshitha-Ireddi:feature/execution-analytics-adaptive-retry

Rakshitha-Ireddi commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

Rakshitha-Ireddi commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adaptive Tool Execution Analytics and Intelligent Retry System

Authors

Overview

Key Features

1. Execution Analytics & Performance Tracking

2. Intelligent Error Classification

3. Adaptive Retry Strategies

4. Intelligent Result Caching

Technical Implementation

New Module: execution_analytics.py

Integration Points

Usage Example

Benefits

Research

Testing

Files Changed

Future Enhancements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Rakshitha-Ireddi commented Feb 13, 2026 •

edited

Loading

New Module: `execution_analytics.py`