[Feature Request] MCTS-style Decision Tree Visualization for In-Context Feedback Learning

## Summary

Add MCTS-style (Monte Carlo Tree Search) decision tree visualization capabilities to AgentGym, enabling researchers to visualize and analyze agentic in-context feedback learning. Inspired by [Weaviate Elysia](https://github.com/weaviate/elysia) decision tree framework.

## Motivation

AgentGym currently has trajectory visualization (env-visualization), but lacks:

1. **Decision tree visualization** - showing branching agent decisions and their outcomes
2. **In-context feedback loops** - visualizing how agents learn from feedback during episodes
3. **MCTS-style exploration** - visualizing exploration vs exploitation in agent reasoning

This is critical for:
- **Debugging agent behavior** - understanding why agents make specific decisions
- **Analyzing RL training** - visualizing policy evolution and value estimates
- **Research reproducibility** - clear visualization of agent reasoning paths

## Proposed Solution

### Core Concepts (Inspired by Elysia)

Elysia uses a **decision tree architecture** where:
- Each node represents a decision point with available actions
- A decision agent evaluates environment, available actions, past/future actions
- The tree structure enables strategic tool selection

We can adapt this for AgentGym with MCTS-style visualization:

```
                    [Root: Initial Observation]
                           /        \
                  [Action A]        [Action B]
                  /       \              \
           [State A1]   [State A2]    [State B1]
              ↓            ↓              ↓
         reward=0.3    reward=0.7    reward=0.5
              ↓            ↓              ↓
          [Rollout]    [Expand]      [Backprop]
```

### Key Features

1. **Decision Tree Rendering**
   - Visualize agent decision points as tree nodes
   - Show available actions at each state
   - Color-code by value/reward estimates

2. **MCTS Metrics Display**
   - Visit counts (N) for each node
   - Value estimates (Q) 
   - UCB scores for exploration
   - Backpropagation paths

3. **In-Context Feedback Visualization**
   - Show how observations inform subsequent decisions
   - Highlight feedback loops (observation → action → reward → updated belief)
   - Display attention/reasoning traces if available

4. **Interactive Features**
   - Zoom into subtrees
   - Step through rollouts
   - Compare different policies side-by-side
   - Export to static formats (SVG, PNG)

### Architecture Enhancement

Extend existing `env-visualization` with new components:

```
env-visualization/
├── src/
│   ├── components/
│   │   ├── TrajectoryViewer.vue     # Existing
│   │   ├── DecisionTree.vue         # NEW: MCTS tree viz
│   │   ├── MCTSMetrics.vue          # NEW: N, Q, UCB display
│   │   └── FeedbackLoop.vue         # NEW: In-context learning viz
│   ├── lib/
│   │   ├── tree-layout.js           # D3.js tree layout
│   │   └── mcts-utils.js            # MCTS metric computation
│   └── stores/
│       └── decision-tree.js         # Tree state management
```

### Data Format Extension

Extend trajectory format to include tree structure:

```json
{
  "trajectory_id": "traj_001",
  "tree": {
    "root": {
      "state": "initial observation",
      "children": [
        {
          "action": "search[query]",
          "state": "search results...",
          "visits": 15,
          "value": 0.73,
          "reward": 0.0,
          "children": [...]
        }
      ]
    }
  },
  "feedback_events": [
    {"step": 3, "type": "reward", "value": 0.5, "context_update": "..."}
  ]
}
```

## Reference: Elysia Architecture

From Elysia documentation:
> Unlike simple agentic platforms which have access to all possible tools at runtime, Elysia has a pre-defined web of possible nodes, each with a corresponding action. Each node in the tree is orchestrated by a decision agent with global context awareness.

We can adopt similar principles:
- Pre-defined action space at each state (from AgentGym env)
- Decision agent (the policy being trained) chooses actions
- Tree visualization shows the exploration/exploitation tradeoff

## Implementation Phases

**Phase 1: Static Tree Visualization**
- Parse trajectory into tree structure
- Render with D3.js
- Basic node/edge styling

**Phase 2: MCTS Metrics**
- Compute/display visit counts, values
- UCB score visualization
- Backpropagation path highlighting

**Phase 3: Interactive Features**
- Step-through animation
- Policy comparison mode
- Export functionality

## Related

- [env-visualization](https://github.com/WooooDyy/AgentGym/tree/main/env-visualization) - Existing trajectory viewer
- [Weaviate Elysia](https://github.com/weaviate/elysia) - Decision tree agentic framework
- [#71](https://github.com/WooooDyy/AgentGym/issues/71) - RAG environment (complementary feature)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] MCTS-style Decision Tree Visualization for In-Context Feedback Learning #72

Summary

Motivation

Proposed Solution

Core Concepts (Inspired by Elysia)

Key Features

Architecture Enhancement

Data Format Extension

Reference: Elysia Architecture

Implementation Phases

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] MCTS-style Decision Tree Visualization for In-Context Feedback Learning #72

Description

Summary

Motivation

Proposed Solution

Core Concepts (Inspired by Elysia)

Key Features

Architecture Enhancement

Data Format Extension

Reference: Elysia Architecture

Implementation Phases

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions