Skip to content

[Feature Request] MCTS-style Decision Tree Visualization for In-Context Feedback Learning #72

@supmo668

Description

@supmo668

Summary

Add MCTS-style (Monte Carlo Tree Search) decision tree visualization capabilities to AgentGym, enabling researchers to visualize and analyze agentic in-context feedback learning. Inspired by Weaviate Elysia decision tree framework.

Motivation

AgentGym currently has trajectory visualization (env-visualization), but lacks:

  1. Decision tree visualization - showing branching agent decisions and their outcomes
  2. In-context feedback loops - visualizing how agents learn from feedback during episodes
  3. MCTS-style exploration - visualizing exploration vs exploitation in agent reasoning

This is critical for:

  • Debugging agent behavior - understanding why agents make specific decisions
  • Analyzing RL training - visualizing policy evolution and value estimates
  • Research reproducibility - clear visualization of agent reasoning paths

Proposed Solution

Core Concepts (Inspired by Elysia)

Elysia uses a decision tree architecture where:

  • Each node represents a decision point with available actions
  • A decision agent evaluates environment, available actions, past/future actions
  • The tree structure enables strategic tool selection

We can adapt this for AgentGym with MCTS-style visualization:

                    [Root: Initial Observation]
                           /        \
                  [Action A]        [Action B]
                  /       \              \
           [State A1]   [State A2]    [State B1]
              ↓            ↓              ↓
         reward=0.3    reward=0.7    reward=0.5
              ↓            ↓              ↓
          [Rollout]    [Expand]      [Backprop]

Key Features

  1. Decision Tree Rendering

    • Visualize agent decision points as tree nodes
    • Show available actions at each state
    • Color-code by value/reward estimates
  2. MCTS Metrics Display

    • Visit counts (N) for each node
    • Value estimates (Q)
    • UCB scores for exploration
    • Backpropagation paths
  3. In-Context Feedback Visualization

    • Show how observations inform subsequent decisions
    • Highlight feedback loops (observation → action → reward → updated belief)
    • Display attention/reasoning traces if available
  4. Interactive Features

    • Zoom into subtrees
    • Step through rollouts
    • Compare different policies side-by-side
    • Export to static formats (SVG, PNG)

Architecture Enhancement

Extend existing env-visualization with new components:

env-visualization/
├── src/
│   ├── components/
│   │   ├── TrajectoryViewer.vue     # Existing
│   │   ├── DecisionTree.vue         # NEW: MCTS tree viz
│   │   ├── MCTSMetrics.vue          # NEW: N, Q, UCB display
│   │   └── FeedbackLoop.vue         # NEW: In-context learning viz
│   ├── lib/
│   │   ├── tree-layout.js           # D3.js tree layout
│   │   └── mcts-utils.js            # MCTS metric computation
│   └── stores/
│       └── decision-tree.js         # Tree state management

Data Format Extension

Extend trajectory format to include tree structure:

{
  "trajectory_id": "traj_001",
  "tree": {
    "root": {
      "state": "initial observation",
      "children": [
        {
          "action": "search[query]",
          "state": "search results...",
          "visits": 15,
          "value": 0.73,
          "reward": 0.0,
          "children": [...]
        }
      ]
    }
  },
  "feedback_events": [
    {"step": 3, "type": "reward", "value": 0.5, "context_update": "..."}
  ]
}

Reference: Elysia Architecture

From Elysia documentation:

Unlike simple agentic platforms which have access to all possible tools at runtime, Elysia has a pre-defined web of possible nodes, each with a corresponding action. Each node in the tree is orchestrated by a decision agent with global context awareness.

We can adopt similar principles:

  • Pre-defined action space at each state (from AgentGym env)
  • Decision agent (the policy being trained) chooses actions
  • Tree visualization shows the exploration/exploitation tradeoff

Implementation Phases

Phase 1: Static Tree Visualization

  • Parse trajectory into tree structure
  • Render with D3.js
  • Basic node/edge styling

Phase 2: MCTS Metrics

  • Compute/display visit counts, values
  • UCB score visualization
  • Backpropagation path highlighting

Phase 3: Interactive Features

  • Step-through animation
  • Policy comparison mode
  • Export functionality

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions