


Large language models (LLMs) increasingly rely on step-by-step reasoning for various applications, yet their reasoning processes remain poorly understood, hindering research, development, and safety efforts. Current approaches to analyze LLM reasoning lack comprehensive visualization tools that can reveal the internal structure and patterns of reasoning paths.
To address this challenge, we introduce Landscape of Thoughts, the first visualization framework designed to explore the reasoning paths of chain-of-thought and its derivatives across any multiple-choice dataset. Our approach represents reasoning states as feature vectors, capturing their distances to all answer choices, and visualizes them in 2D using t-SNE dimensionality reduction.
Through qualitative and quantitative analysis, Landscape of Thoughts enables researchers to:
- Distinguish model performance: Effectively differentiate between strong versus weak models
- Analyze reasoning quality: Compare correct versus incorrect reasoning paths
- Explore task diversity: Understand reasoning patterns across different types of problems
- Identify failure modes: Reveal undesirable reasoning patterns such as inconsistency and high uncertainty
- Dataset and Model Usage
- Custom Model Integration
- Dataset Creation
- Prompt Customization
- Animation Tutorial
- Quick Start Notebook
We provide two installation methods:
Install the Landscape of Thoughts framework directly via pip:
pip install landscape-of-thoughts==0.1.0
For development or customization, clone the repository and set up the environment:
# Clone the repository
git clone https://github.com/tmlr-group/landscape-of-thoughts.git
cd landscape-of-thoughts
# Create and activate conda environment
conda create -n landscape python=3.10
conda activate landscape
# Install dependencies
pip install -r requirements.txt
pip install fire --use-pep517
Before analyzing your data, you need to set up a language model. For detailed instructions, see our model setup guidance.
Two ways to use the framework to plot the landscape, depending on the installation method:
- If you installed the package, you can use the
lot
command to plot the landscape. - If you installed the framework from source, you can use the
main.py
script to plot the landscape.
After installing the package and setting up your model, you can start analyzing reasoning patterns immediately:
For example, the following command executes the complete analysis pipeline. It employs the meta-llama/Llama-3.2-1B-Instruct
model to generate 10 reasoning traces for each of the first 5 examples in the AQUA
dataset, using the Chain-of-Thought (cot
) method. The model is hosted locally (--local
) via vLLM with the API key token-abc123
. Finally, it generates and saves the landscape visualization in the figures/landscape
. More configuration options are available in the configuration section.
lot --task all \
--model_name meta-llama/Llama-3.2-1B-Instruct \
--dataset_name aqua \
--method cot \
--num_samples 10 \
--start_index 0 \
--end_index 5 \
--output_dir figures/landscape \
--local \
--local_api_key token-abc123
Use the main script for complete pipeline execution:
python main.py \
--task all \
--model_name meta-llama/Llama-3.2-1B-Instruct \
--dataset_name aqua \
--method cot \
--num_samples 10 \
--start_index 0 \
--end_index 5 \
--output_dir figures/landscape \
--local \
--local_api_key token-abc123
For advanced usage and integration into research workflows, you can utilize the Python API. The task
parameter allows you to control which components of the pipeline to execute:
sample
: This option generates reasoning traces from the language model.calculate
: This option computes distance matrices between reasoning states.plot
: This option creates visualizations of the reasoning landscape.all
: This option executes the complete all three tasks.
The following example demonstrates how to use the API to perform each step of the analysis pipeline individually.
sample
generates 10 reasoning traces for the first 5 examples of theAQUA
dataset using themeta-llama/Meta-Llama-3-8B-Instruct
model and the CoT method.calculate
computes the distance matrices for these traces.plot
generates the landscape visualization from the processed data.
from lot import sample, calculate, plot
# Generate reasoning traces
features, metrics = sample(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
dataset_name="aqua",
method="cot",
num_samples=10,
start_index=0,
end_index=5
)
# Calculate distance matrices
distance_matrices = calculate(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
dataset_name="aqua",
method="cot",
start_index=0,
end_index=5
)
# Generate visualizations
plot(
model_name="Meta-Llama-3-8B-Instruct",
dataset_name="aqua",
method="cot",
)
The example below shows how to generate an animation for the Meta-Llama-3.1-70B-Instruct-Turbo
model on the AQUA
dataset using the CoT method. The animation will be saved to the figures/animation
directory. Note that the Landscape-Data
is pulled from the Landscape-Data dataset.
from lot.animation import animation_plot
from datasets import load_dataset
# This will download and cache the dataset in the "Landscape-Data" directory
dataset = load_dataset("GazeEzio/Landscape-Data", cache_dir="Landscape-Data")
animation_plot(
model_name = 'Meta-Llama-3.1-70B-Instruct-Turbo',
dataset_name = 'aqua',
method = 'cot',
save_root = "Landscape-Data",
save_video = True,
output_dir = "figures/animation",
)
For detailed examples, see animation.ipynb.
model_name
: Identifier for the language model (e.g.,meta-llama/Meta-Llama-3-8B-Instruct
)dataset_name
: Target dataset for analysis (e.g.,aqua
,mmlu
)method
: Reasoning approach (cot
,tot
,mcts
,l2m
)num_samples
: Number of reasoning traces to collect per examplestart_index
/end_index
: Range of dataset examples to process
The framework supports any open-source language model accessible via API, provided that token-level log probabilities are available. Models can be hosted using:
- vLLM: For local model serving
- API providers: Compatible with OpenAI-style APIs
- Host the model locally using vLLM:
vllm serve Qwen/Qwen2.5-3B-Instruct \
--api-key "token-api-123" \
--download-dir YOUR_MODEL_PATH \
--port 8000
- Run analysis with the hosted model:
python main.py \
--task all \
--model_name Qwen/Qwen2.5-3B-Instruct \
--dataset_name aqua \
--method cot \
--num_samples 10 \
--start_index 0 \
--end_index 5 \
--plot_type method \
--output_dir figures/landscape \
--local \
--local_api_key token-abc123
The framework accepts any multiple-choice question dataset in JSONL format with the following structure:
{
"question": "What is the capital of France?",
"options": ["A) London", "B) Berlin", "C) Paris", "D) Madrid"],
"answer": "C"
}
aqua
: Algebraic reasoning problemscommonsenseqa
: Common sense reasoning questionsmmlu
: Massive multitask language understandingstrategyqa
: Strategic reasoning questions
Create your own datasets following our format specifications. For detailed instructions on creating, validating, and using custom datasets, see our Custom Datasets Guidance.
- Chain-of-Thought (CoT): Step-by-step sequential reasoning
- Tree-of-Thoughts (ToT): Exploration of multiple reasoning branches
- Monte Carlo Tree Search (MCTS): Strategic search through reasoning paths
- Least-to-Most (L2M): Decomposes complex problems into a sequence of simpler subproblems
- Method comparison: Compare different reasoning approaches
- Correctness analysis: Distinguish correct vs. incorrect reasoning
- Task analysis: Explore reasoning patterns across problem types
- Temporal dynamics: Animate reasoning progression over reasoning steps
If you find this work useful for your research, please cite:
@article{zhou2025landscape,
title={Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models},
author={Zhou, Zhanke and Zhu, Zhaocheng and Li, Xuan and Galkin, Mikhail and Feng, Xiao and Koyejo, Sanmi and Tang, Jian and Han, Bo},
journal={arXiv preprint arXiv:2503.22165},
year={2025},
url={https://arxiv.org/abs/2503.22165}
}
For questions, technical support, or collaboration inquiries:
- Email: Zhanke Zhou ([email protected]), Zhaocheng Zhu ([email protected]), Xuan Li ([email protected])
- Issues: GitHub Issues