Skip to content

tmlr-group/landscape-of-thoughts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Landscape of Thoughts

Visualizing the Reasoning Process of Large Language Models

Project Website Paper Hugging Face Datasets Open In Colab

Demo of Landscape of Thoughts Visualization Illustration of Landscape of thoughts for visualizing the reasoning steps of LLMs. Note that the red landscape represents wrong reasoning cases, while the blue indicates the correct ones. The darker regions in landscapes indicate more thoughts, with indicating incorrect answers and marking correct answers.

Motivation

Large language models (LLMs) increasingly rely on step-by-step reasoning for various applications, yet their reasoning processes remain poorly understood, hindering research, development, and safety efforts. Current approaches to analyze LLM reasoning lack comprehensive visualization tools that can reveal the internal structure and patterns of reasoning paths.

To address this challenge, we introduce Landscape of Thoughts, the first visualization framework designed to explore the reasoning paths of chain-of-thought and its derivatives across any multiple-choice dataset. Our approach represents reasoning states as feature vectors, capturing their distances to all answer choices, and visualizes them in 2D using t-SNE dimensionality reduction.

Key Capabilities

Through qualitative and quantitative analysis, Landscape of Thoughts enables researchers to:

  • Distinguish model performance: Effectively differentiate between strong versus weak models
  • Analyze reasoning quality: Compare correct versus incorrect reasoning paths
  • Explore task diversity: Understand reasoning patterns across different types of problems
  • Identify failure modes: Reveal undesirable reasoning patterns such as inconsistency and high uncertainty

Examples and Tutorials

Installation

We provide two installation methods:

Package Installation (Recommended)

Install the Landscape of Thoughts framework directly via pip:

pip install landscape-of-thoughts==0.1.0

Development Installation

For development or customization, clone the repository and set up the environment:

# Clone the repository
git clone https://github.com/tmlr-group/landscape-of-thoughts.git
cd landscape-of-thoughts

# Create and activate conda environment
conda create -n landscape python=3.10
conda activate landscape

# Install dependencies
pip install -r requirements.txt
pip install fire --use-pep517

Model Setup

Before analyzing your data, you need to set up a language model. For detailed instructions, see our model setup guidance.

Usage

Two ways to use the framework to plot the landscape, depending on the installation method:

  • If you installed the package, you can use the lot command to plot the landscape.
  • If you installed the framework from source, you can use the main.py script to plot the landscape.

Quick Start with Command Line Interface

After installing the package and setting up your model, you can start analyzing reasoning patterns immediately:

For example, the following command executes the complete analysis pipeline. It employs the meta-llama/Llama-3.2-1B-Instruct model to generate 10 reasoning traces for each of the first 5 examples in the AQUA dataset, using the Chain-of-Thought (cot) method. The model is hosted locally (--local) via vLLM with the API key token-abc123. Finally, it generates and saves the landscape visualization in the figures/landscape. More configuration options are available in the configuration section.

lot --task all \
    --model_name meta-llama/Llama-3.2-1B-Instruct \
    --dataset_name aqua \
    --method cot \
    --num_samples 10 \
    --start_index 0 \
    --end_index 5 \
    --output_dir figures/landscape \
    --local \
    --local_api_key token-abc123

Unified Script Interface

Use the main script for complete pipeline execution:

python main.py \
  --task all \
  --model_name meta-llama/Llama-3.2-1B-Instruct \
  --dataset_name aqua \
  --method cot \
  --num_samples 10 \
  --start_index 0 \
  --end_index 5 \
  --output_dir figures/landscape \
  --local \
  --local_api_key token-abc123

Task Control

For advanced usage and integration into research workflows, you can utilize the Python API. The task parameter allows you to control which components of the pipeline to execute:

  • sample: This option generates reasoning traces from the language model.
  • calculate: This option computes distance matrices between reasoning states.
  • plot: This option creates visualizations of the reasoning landscape.
  • all: This option executes the complete all three tasks.

The following example demonstrates how to use the API to perform each step of the analysis pipeline individually.

  1. sample generates 10 reasoning traces for the first 5 examples of the AQUA dataset using the meta-llama/Meta-Llama-3-8B-Instruct model and the CoT method.
  2. calculate computes the distance matrices for these traces.
  3. plot generates the landscape visualization from the processed data.
from lot import sample, calculate, plot

# Generate reasoning traces
features, metrics = sample(
    model_name="meta-llama/Meta-Llama-3-8B-Instruct",
    dataset_name="aqua",
    method="cot",
    num_samples=10,
    start_index=0,
    end_index=5
)

# Calculate distance matrices
distance_matrices = calculate(
    model_name="meta-llama/Meta-Llama-3-8B-Instruct",
    dataset_name="aqua",
    method="cot",
    start_index=0,
    end_index=5
)

# Generate visualizations
plot(
    model_name="Meta-Llama-3-8B-Instruct",
    dataset_name="aqua",
    method="cot",
)

Animation Visualization

The example below shows how to generate an animation for the Meta-Llama-3.1-70B-Instruct-Turbo model on the AQUA dataset using the CoT method. The animation will be saved to the figures/animation directory. Note that the Landscape-Data is pulled from the Landscape-Data dataset.

from lot.animation import animation_plot
from datasets import load_dataset

# This will download and cache the dataset in the "Landscape-Data" directory
dataset = load_dataset("GazeEzio/Landscape-Data", cache_dir="Landscape-Data")

animation_plot(
    model_name = 'Meta-Llama-3.1-70B-Instruct-Turbo',
    dataset_name = 'aqua',
    method = 'cot',
    save_root = "Landscape-Data",
    save_video = True,
    output_dir = "figures/animation",
)

For detailed examples, see animation.ipynb.

Configuration

Key Parameters

  • model_name: Identifier for the language model (e.g., meta-llama/Meta-Llama-3-8B-Instruct)
  • dataset_name: Target dataset for analysis (e.g., aqua, mmlu)
  • method: Reasoning approach (cot, tot, mcts, l2m)
  • num_samples: Number of reasoning traces to collect per example
  • start_index/end_index: Range of dataset examples to process

Supported Models

The framework supports any open-source language model accessible via API, provided that token-level log probabilities are available. Models can be hosted using:

  • vLLM: For local model serving
  • API providers: Compatible with OpenAI-style APIs

Example: Using Qwen2.5-3B-Instruct

  1. Host the model locally using vLLM:
vllm serve Qwen/Qwen2.5-3B-Instruct \
  --api-key "token-api-123" \
  --download-dir YOUR_MODEL_PATH \
  --port 8000
  1. Run analysis with the hosted model:
python main.py \
  --task all \
  --model_name Qwen/Qwen2.5-3B-Instruct \
  --dataset_name aqua \
  --method cot \
  --num_samples 10 \
  --start_index 0 \
  --end_index 5 \
  --plot_type method \
  --output_dir figures/landscape \
  --local \
  --local_api_key token-abc123

Supported Datasets

The framework accepts any multiple-choice question dataset in JSONL format with the following structure:

{
  "question": "What is the capital of France?",
  "options": ["A) London", "B) Berlin", "C) Paris", "D) Madrid"],
  "answer": "C"
}

Built-in Datasets

  • aqua: Algebraic reasoning problems
  • commonsenseqa: Common sense reasoning questions
  • mmlu: Massive multitask language understanding
  • strategyqa: Strategic reasoning questions

Custom Datasets

Create your own datasets following our format specifications. For detailed instructions on creating, validating, and using custom datasets, see our Custom Datasets Guidance.

Advanced Features

Reasoning Methods

  • Chain-of-Thought (CoT): Step-by-step sequential reasoning
  • Tree-of-Thoughts (ToT): Exploration of multiple reasoning branches
  • Monte Carlo Tree Search (MCTS): Strategic search through reasoning paths
  • Least-to-Most (L2M): Decomposes complex problems into a sequence of simpler subproblems

Visualization Types

  • Method comparison: Compare different reasoning approaches
  • Correctness analysis: Distinguish correct vs. incorrect reasoning
  • Task analysis: Explore reasoning patterns across problem types
  • Temporal dynamics: Animate reasoning progression over reasoning steps

Citation

If you find this work useful for your research, please cite:

@article{zhou2025landscape,
  title={Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models},
  author={Zhou, Zhanke and Zhu, Zhaocheng and Li, Xuan and Galkin, Mikhail and Feng, Xiao and Koyejo, Sanmi and Tang, Jian and Han, Bo},
  journal={arXiv preprint arXiv:2503.22165},
  year={2025},
  url={https://arxiv.org/abs/2503.22165}
}

Contact

For questions, technical support, or collaboration inquiries:

About

[ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •