Skip to content

Latest commit

 

History

History
575 lines (434 loc) · 24.5 KB

File metadata and controls

575 lines (434 loc) · 24.5 KB

🌐 WorldMind

Aligning Agentic World Models via Knowledgeable Experience Learning

Typing Animation
演示动画

WorldMind Framework

WorldMind is a framework for aligning agentic world models through knowledgeable experience learning, enabling agents to learn directly from the environment.


📖 Overview

WorldMind introduces a paradigm shift in how embodied AI agents learn and adapt. Unlike traditional approaches that rely on extensive environment interaction or domain-specific fine-tuning, WorldMind operates as a training-free framework that enables agents to:

  • Learn from Experience: Extract reusable symbolic knowledge from both successful task completions and prediction errors without gradient updates.
  • Generalize Across Tasks: Apply learned causal rules and heuristics to novel situations through semantic similarity-based retrieval.
  • Continuously Improve: Accumulate and refine the World Knowledge Repository (WKR) throughout deployment.

Key Features

Feature Description
🧠 Experience Learning Combines Goal Experience (heuristics) from successful trajectories with Process Experience (causal boundaries) from prediction errors
🔄 Experience-Driven Alignment Uses State Abstraction and Verifier components to align world model predictions with actual environment dynamics
🌐 Universal Adaptability Seamlessly generalizes across diverse embodied environments (ALFRED, Habitat, Navigation) and tasks without specific fine-tuning
🔌 Modular Plugin Standalone plugin for easy integration into existing agent systems

Method

WorldMind introduces a two-stage approach for world model alignment:

Stage 1 extracts knowledge during task execution (World Knowledge Building):

  • Goal Experience: From successful trajectories, distill procedural heuristics to guide task optimality.
  • Process Experience: Employ a Predict-Act-Verify loop. When a Verifier detects a semantic discrepancy between the predicted and actual abstract states, a Self-Reflexion mechanism synthesizes corrective causal rules.

Stage 2 applies learned knowledge to new tasks (Inference via Constrained Simulation):

  • Retrieve relevant Process and Goal experiences via semantic similarity.
  • Gated Simulation: Selectively simulate outcomes only when target objects are grounded, enhancing inference efficiency.
  • Augment world model prompts with retrieved knowledge to constrain planning within physical feasibility.

🖥️ Installation

Note: We need to set up two conda environments:

  • worldmind for EB-ALFRED and EB-Habitat
  • worldmind_nav for EB-Navigation

Please use SSH download instead of HTTP to avoid errors during git lfs pull.

Environment Setup

1. Clone Repository

git clone https://github.com/zjunlp/WorldMind.git
cd WorldMind

2. Create Conda Environments

1️⃣ Environment for ALFRED and Habitat (High-Level Planning)
# Create environment named 'worldmind' 
conda env create -f conda_envs/environment.yaml 
conda activate worldmind
pip install -e .
2️⃣ Environment for Navigation (Low-Level Navigation)
# Create environment named 'worldmind_nav'
conda env create -f conda_envs/environment_eb-nav.yaml 
conda activate worldmind_nav
pip install -e .

3. Start Headless Server

For headless servers, start the X server in a separate tmux window:

conda activate worldmind
python -m embodiedbench.envs.eb_alfred.scripts.startx 1

Task-Specific Setup

🏠 EB-ALFRED (Household Tasks)

1.Download Data:

conda activate embench
git clone https://huggingface.co/datasets/EmbodiedBench/EB-ALFRED
mv EB-ALFRED embodiedbench/envs/eb_alfred/data/json_2.1.0

2.Verify Installation:

conda activate worldmind

# Remember to start the headless server first!
python -m embodiedbench.envs.eb_alfred.EBAlfEnv
🛋️ EB-Habitat (Rearrangement Tasks)

1. Install Habitat Sim & Lab:

conda activate worldmind

# Install Habitat-Sim with Bullet physics support
conda install -y habitat-sim==0.3.0 withbullet headless -c conda-forge -c aihabitat

# Install Habitat-Lab
cd ./habitat-lab
pip install -e habitat-lab
cd ..

2.Download Data: Download YCB and ReplicaCAD dataset for the Language Rearrangement task.

conda install -y -c conda-forge git-lfs
python -m habitat_sim.utils.datasets_download --uids rearrange_task_assets
mv data embodiedbench/envs/eb_habitat

Note: After the above step, there should be a data folder under embodiedbench/envs/eb_habitat.

2. Verify Installation: Run the following code to ensure the EB-Habitat environment is working correctly.

python -m embodiedbench.envs.eb_habitat.EBHabEnv
🧭 EB-Navigation (Vision-and-Language Navigation)

Verify Installation: Run the following code to ensure the EB-Navigation environment is working correctly.

conda activate worldmind_nav
python -m embodiedbench.envs.eb_navigation.EBNavEnv

🚀 Quick Start

Running Experiments

We provide a universal run script run.sh for easy experiment execution. Simply configure the script and run:

#!/bin/bash
# WorldMind Universal Run Script
# Supports all three environments: Alfred (eb-alf), Habitat (eb-hab), Navigation (eb-nav)

set -e

# ============================================================
# ENVIRONMENT VARIABLES (Export Section)
# ============================================================

export CUDA_VISIBLE_DEVICES=0
export OPENAI_API_KEY="your-openai-api-key"
export OPENAI_BASE_URL="your-openai-base-url"

# ============================================================
# CONFIGURATION PARAMETERS (Edit here)
# ============================================================

MODEL_NAME="gpt-3.5-turbo"   # Choose your model
ENV="eb-hab"              # Options: eb-alf, eb-hab, eb-nav
EXP_NAME="test"       # Your experiment name
ENABLE_WORLDMIND="True"   # True or False

# WorldMind component models (fixed to MODEL_NAME)
export WORLDMIND_DISCRIMINATOR_MODEL="$MODEL_NAME"
export WORLDMIND_SUMMARIZER_MODEL="$MODEL_NAME"
export WORLDMIND_REFLECTOR_MODEL="$MODEL_NAME"
export WORLDMIND_REFINER_MODEL="$MODEL_NAME"

# ============================================================
# VALIDATION
# ============================================================

if [ -z "$OPENAI_API_KEY" ]; then
    echo "=========================================="
    echo "ERROR: OPENAI_API_KEY not set!"
    echo "=========================================="
    exit 1
fi

case "$ENV" in
    eb-alf|eb-hab|eb-nav)
        echo "✓ Valid environment: $ENV"
        ;;
    *)
        echo "=========================================="
        echo "ERROR: Invalid environment '$ENV'"
        echo "=========================================="
        echo "Valid options: eb-alf, eb-hab, eb-nav"
        exit 1
        ;;
esac

# ============================================================
# DISPLAY CONFIGURATION
# ============================================================

echo ""
echo "=========================================="
echo "WorldMind Experiment Configuration"
echo "=========================================="
echo "Environment:     $ENV"
echo "Model:           $MODEL_NAME"
echo "Experiment:      $EXP_NAME"
echo "WorldMind:       $ENABLE_WORLDMIND"
echo "----------------------------------------"
echo "GPU Device:      $CUDA_VISIBLE_DEVICES"
echo "Display:         $DISPLAY"
echo "API Base URL:    $OPENAI_BASE_URL"
echo "=========================================="
echo ""

# ============================================================
# RUN EXPERIMENT
# ============================================================

python -m embodiedbench.main \
    env="$ENV" \
    model_name="$MODEL_NAME" \
    exp_name="$EXP_NAME" \
    enable_worldmind="$ENABLE_WORLDMIND"

Usage:

bash run.sh

Configuration

WorldMind uses YAML configuration files for experiment settings. You can find and customize these files in the WorldMind/embodiedbench/configs directory.

📄 Click to view example configuration (`configs/eb-nav.yaml`)
# configs/eb-nav.yaml
model_name: gpt-4o-mini
model_type: remote
exp_name: navigation_baseline

# WorldMind Settings
enable_worldmind: True
use_vision_discriminator: false
use_experience_trajectory: true
detailed_output: true

# Goal Experience Settings
enable_goal_experience: true
goal_experience_top_k: 2

# Process Experience Settings
enable_process_experience: true
process_experience_top_k: 2

# Experience Refinement
enable_experience_refine: true
use_worldmind_template: true

Key Configuration Options

Parameter Description Default
enable_worldmind Enable WorldMind components True
enable_goal_experience Enable goal experience retrieval True
goal_experience_top_k Number of goal experiences to retrieve 2
enable_process_experience Enable process experience retrieval True
process_experience_top_k Number of process experiences to retrieve 2
enable_experience_refine Enable LLM-based experience refinement True

🌍 Environments

🏠 EB-ALFRED (Household Tasks)

A benchmark for grounded language learning in 3D household environments. Tasks require agents to execute multi-step instructions involving object manipulation.

Evaluation Metrics: Success Rate (SR) and Goal Condition (GC)

Evaluation Sets: Base, Common, Complex, Visual, Spatial

🛋️ EB-Habitat (Rearrangement Tasks)

A simulation platform for embodied AI research focusing on object rearrangement tasks in realistic indoor environments.

Evaluation Metrics: Success Rate (SR) and Goal Condition (GC)

Evaluation Sets: Base, Common, Complex, Visual, Spatial

🧭 EB-Navigation (Vision-and-Language Navigation)

A discrete navigation environment where agents must reach target locations through natural language instructions.

Evaluation Metrics: Success Rate (SR)

Evaluation Sets: Base, Common, Complex, Visual


🔌 WorldMind Plugin

To facilitate seamless integration across diverse domains, we provide a universal, standalone plugin featuring a highly modular architecture. This powerful tool empowers you to rapidly deploy WorldMind's core capabilities—such as experience extraction and memory retrieval—into your own custom environments or new projects with minimal effort, significantly accelerating your research and development pipeline.

from worldmind_plugin import (
    WorldMindConfig,
    ProcessExperienceModule,
    GoalExperienceModule,
    ExperienceRetrievalModule,
    ProcessTrajectoryStep,
    GoalTrajectoryStep
)

# Create configuration
config = WorldMindConfig(
    api_key="your-api-key",
    save_path="./worldmind_output"
)

# Initialize modules independently
process_module = ProcessExperienceModule(config)
goal_module = GoalExperienceModule(config)
retrieval_module = ExperienceRetrievalModule(config)

# Extract goal experience from successful trajectory
trajectory = [
    GoalTrajectoryStep(
        action="navigate_to(kitchen)",
        env_feedback="Arrived at kitchen",
        observation="Kitchen counter visible"
    ),
    # ... more steps
]

experience = goal_module.extract_experience(
    task_instruction="Go to the kitchen and get an apple",
    trajectory=trajectory
)

# Retrieve experiences for a new task
result = retrieval_module.retrieve(
    task_instruction="Find the coffee mug",
    enable_refine=True
)

# Use in agent prompt
agent_prompt = f"""You are a helpful assistant.

{result['formatted_prompt']}

Task: Find the coffee mug
"""

See Plugin/README.md for detailed documentation.


📁 Project Structure

WorldMind/
├── 📂 embodiedbench/
│   ├── 📂 envs/                    # Environment implementations
│   │   ├── eb_alfred/              # ALFRED environment
│   │   ├── eb_habitat/             # Habitat environment
│   │   └── eb_navigation/          # Navigation environment
│   ├── 📂 evaluator/               # Evaluation scripts
│   └── 📂 worldmind/               # WorldMind core modules
│       ├── alfred/                 # ALFRED integration
│       ├── habitat/                # Habitat integration
│       └── navigation/             # Navigation integration
├── 📂 Plugin/                      # Standalone WorldMind Plugin
├── 📂 assets/                      # Images and resources
└── 📄 README.md

📊 Results

EB-ALFRED Results

Model Success Rate (SR) % Goal Condition (GC) %
AvgBaseCommonComplexVisualSpatial AvgBaseCommonComplexVisualSpatial
Open-source and Proprietary Models
GPT-4o56.864.054.068.046.052.065.174.060.374.058.361.3
GPT-4o-mini28.834.028.036.024.022.034.347.835.343.533.329.0
Claude-3.7-Sonnet67.268.068.070.068.062.065.372.066.076.763.059.7
Gemini-1.5-Pro63.270.064.072.058.052.067.474.366.776.562.859.0
Llama-3.2-90B-Vis35.238.034.044.028.032.037.643.737.349.235.336.0
InternVL2.5-78B37.041.040.039.016.049.041.042.335.343.335.740.3
GPT-3.5-turbo Based Methods
ReAct44.452.048.052.032.038.050.455.353.555.342.745.0
BoN42.846.042.050.042.034.050.454.246.556.552.042.8
SimuRA45.250.042.054.038.042.053.657.847.859.748.554.3
ReasoningBank41.650.036.044.036.042.047.657.541.547.044.248.0
Synapse38.838.046.040.036.034.043.642.551.342.742.039.7
AWM40.046.032.048.040.034.046.253.239.250.747.041.0
WorldMind48.058.048.056.034.044.054.163.052.761.041.752.0
GPT-4.1-mini Based Methods
ReAct41.250.040.046.038.032.047.555.342.852.247.239.8
BoN44.446.044.050.042.040.049.550.848.354.748.745.0
SimuRA45.652.044.054.038.040.052.261.050.358.245.346.3
ReasoningBank38.042.036.042.034.036.042.646.738.845.841.540.3
Synapse37.240.032.044.036.034.042.241.237.549.541.341.7
AWM41.244.036.048.038.040.046.048.342.052.544.342.7
WorldMind49.250.058.054.042.042.055.761.061.058.848.049.7

EB-Habitat Results

Model Success Rate (SR) % Goal Condition (GC) %
AvgBaseCommonComplexVisualSpatial AvgBaseCommonComplexVisualSpatial
Open-source and Proprietary Models
GPT-4o56.864.054.068.046.052.065.174.060.374.058.361.3
GPT-4o-mini28.834.028.036.024.022.034.347.835.343.533.329.0
Claude-3.7-Sonnet67.268.068.070.068.062.065.372.066.076.763.059.7
Gemini-1.5-Pro63.270.064.072.058.052.067.474.366.776.562.859.0
Llama-3.2-90B-Vis35.238.034.044.028.032.037.643.737.349.235.336.0
InternVL2.5-78B37.041.040.039.016.049.041.042.335.343.335.740.3
GPT-3.5-turbo Based Methods
ReAct44.452.048.052.032.038.050.455.353.555.342.745.0
BoN42.846.042.050.042.034.050.454.246.556.552.042.8
SimuRA45.250.042.054.038.042.053.657.847.859.748.554.3
ReasoningBank41.650.036.044.036.042.047.657.541.547.044.248.0
Synapse38.838.046.040.036.034.043.642.551.342.742.039.7
AWM40.046.032.048.040.034.046.253.239.250.747.041.0
WorldMind48.058.048.056.034.044.054.163.052.761.041.752.0
GPT-4.1-mini Based Methods
ReAct41.250.040.046.038.032.047.555.342.852.247.239.8
BoN44.446.044.050.042.040.049.550.848.354.748.745.0
SimuRA45.652.044.054.038.040.052.261.050.358.245.346.3
ReasoningBank38.042.036.042.034.036.042.646.738.845.841.540.3
Synapse37.240.032.044.036.034.042.241.237.549.541.341.7
AWM41.244.036.048.038.040.046.048.342.052.544.342.7
WorldMind49.250.058.054.042.042.055.761.061.058.848.049.7

Detailed results and ablation studies available in our paper.


📝 Citation

If you find this work useful, please cite:

@article{ren2026aligning,
  title={Aligning Agentic World Models via Knowledgeable Experience Learning},
  author={Ren, Baochang and Yao, Yunzhi and Sun, Rui and Qiao, Shuofei and Zhang, Ningyu and Chen, Huajun},
  journal={arXiv preprint arXiv:2601.13247},
  year={2026}
}

🙏 Acknowledgments

We thank the following projects and teams for their open-source contributions:

  • EmbodiedBench for the evaluation tasks
  • ALFRED and AI2-THOR for the household task benchmark and simulation environment
  • Habitat for the rearrangement simulation platform
  • vLLM for efficient LLM inference and serving