-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Summary
Add MCTS-style (Monte Carlo Tree Search) decision tree visualization capabilities to AgentGym, enabling researchers to visualize and analyze agentic in-context feedback learning. Inspired by Weaviate Elysia decision tree framework.
Motivation
AgentGym currently has trajectory visualization (env-visualization), but lacks:
- Decision tree visualization - showing branching agent decisions and their outcomes
- In-context feedback loops - visualizing how agents learn from feedback during episodes
- MCTS-style exploration - visualizing exploration vs exploitation in agent reasoning
This is critical for:
- Debugging agent behavior - understanding why agents make specific decisions
- Analyzing RL training - visualizing policy evolution and value estimates
- Research reproducibility - clear visualization of agent reasoning paths
Proposed Solution
Core Concepts (Inspired by Elysia)
Elysia uses a decision tree architecture where:
- Each node represents a decision point with available actions
- A decision agent evaluates environment, available actions, past/future actions
- The tree structure enables strategic tool selection
We can adapt this for AgentGym with MCTS-style visualization:
[Root: Initial Observation]
/ \
[Action A] [Action B]
/ \ \
[State A1] [State A2] [State B1]
↓ ↓ ↓
reward=0.3 reward=0.7 reward=0.5
↓ ↓ ↓
[Rollout] [Expand] [Backprop]
Key Features
-
Decision Tree Rendering
- Visualize agent decision points as tree nodes
- Show available actions at each state
- Color-code by value/reward estimates
-
MCTS Metrics Display
- Visit counts (N) for each node
- Value estimates (Q)
- UCB scores for exploration
- Backpropagation paths
-
In-Context Feedback Visualization
- Show how observations inform subsequent decisions
- Highlight feedback loops (observation → action → reward → updated belief)
- Display attention/reasoning traces if available
-
Interactive Features
- Zoom into subtrees
- Step through rollouts
- Compare different policies side-by-side
- Export to static formats (SVG, PNG)
Architecture Enhancement
Extend existing env-visualization with new components:
env-visualization/
├── src/
│ ├── components/
│ │ ├── TrajectoryViewer.vue # Existing
│ │ ├── DecisionTree.vue # NEW: MCTS tree viz
│ │ ├── MCTSMetrics.vue # NEW: N, Q, UCB display
│ │ └── FeedbackLoop.vue # NEW: In-context learning viz
│ ├── lib/
│ │ ├── tree-layout.js # D3.js tree layout
│ │ └── mcts-utils.js # MCTS metric computation
│ └── stores/
│ └── decision-tree.js # Tree state management
Data Format Extension
Extend trajectory format to include tree structure:
{
"trajectory_id": "traj_001",
"tree": {
"root": {
"state": "initial observation",
"children": [
{
"action": "search[query]",
"state": "search results...",
"visits": 15,
"value": 0.73,
"reward": 0.0,
"children": [...]
}
]
}
},
"feedback_events": [
{"step": 3, "type": "reward", "value": 0.5, "context_update": "..."}
]
}Reference: Elysia Architecture
From Elysia documentation:
Unlike simple agentic platforms which have access to all possible tools at runtime, Elysia has a pre-defined web of possible nodes, each with a corresponding action. Each node in the tree is orchestrated by a decision agent with global context awareness.
We can adopt similar principles:
- Pre-defined action space at each state (from AgentGym env)
- Decision agent (the policy being trained) chooses actions
- Tree visualization shows the exploration/exploitation tradeoff
Implementation Phases
Phase 1: Static Tree Visualization
- Parse trajectory into tree structure
- Render with D3.js
- Basic node/edge styling
Phase 2: MCTS Metrics
- Compute/display visit counts, values
- UCB score visualization
- Backpropagation path highlighting
Phase 3: Interactive Features
- Step-through animation
- Policy comparison mode
- Export functionality
Related
- env-visualization - Existing trajectory viewer
- Weaviate Elysia - Decision tree agentic framework
- #71 - RAG environment (complementary feature)