LLM-Hanabi: A Benchmark for Theory-of-Mind in Multi-Agent Collaboration

This repository implements LLM-Hanabi, a benchmark for evaluating rationale inference and Theory-of-Mind (ToM) capabilities of Large Language Models (LLMs) in the cooperative card game Hanabi. It assesses how well LLMs infer others' intentions (1st-order ToM) and predict others' interpretations (2nd-order ToM) in a dynamic, collaborative setting with imperfect information.

Overview

Hanabi is a cooperative game where 2–5 players build firework stacks by playing cards in order, using limited hints to convey information. This codebase simulates games with LLM-driven agents, evaluates their ToM proficiency, and analyzes correlations between ToM and game performance.

Key Features

Game Simulation: Supports 2–5 players with configurable tokens and AI strategies (e.g., Chain-of-Thought, Adaptive Behavior Design).
ToM Evaluation: Scores 1st-order (0–10) and 2nd-order (0–5) ToM based on agents' rationales and actions.
Correlation Analysis: Computes Pearson correlations between ToM scores and game scores.
Logging: Saves game logs, ToM records, and summaries in JSON/CSV formats.
Scalability: Uses multiprocessing for parallel game simulations.

Repository Structure

hanabi.py: Main script to run game simulations, manage player groups, and compute ToM and game score correlations.
HanabiEnv.py: Implements the Hanabi game environment, providing interfaces for different agents to interact with the game.
Agents.py: Defines LLM-driven agent classes (LLMsAgent, Basic_LLMsAgent, CoT_LLMsAgent, ABD_LLMsAgent). New agent types can be added here.
ToM_eval.py: Evaluates 1st-order and 2nd-order ToM scores based on agents' rationales and actions.
call_api.py: Handles API calls to LLM providers (e.g., OpenRouter).
players_groups.yaml: Configures player groups, including models, strategies, and parameters. The ABD strategy enables ToM-based reasoning and scoring.

Installation

Clone Repository:

git clone [email protected]:HKUST-Knowcomp/LLM-Hanabi.git
cd ToMHanabi

Install Dependencies: Install the required Python packages using the provided requirements.txt:

pip install -r requirements.txt

Configure Environment:
- API Keys: Update call_api.py with your API tokens to enable LLM interactions.
- Player Settings: Modify players_groups.yaml to configure players, count or temperature
Run Simulations: Execute the main script to run games with your desired configuration:

python hanabi.py --group Single_model_group --game_name LLM-Hanabi --batch 30 --num_processes 15

--group: Specify the player group from players_groups.yaml (e.g., Single_model_group).
--game_name: Set a custom name for the game (e.g., LLM-Hanabi).
--batch: Number of games to simulate (e.g., 30).
--num_processes: Number of parallel processes (e.g., 15, adjust based on your system's capabilities).
--log: Add this flag to record detailed model responses in <num_players>_players-<game_name>-log.json.

Environment Configurations

To replicate this environment, use:

pip install -r requirements.txt

Outputs

Results are stored in the game_log/ folder:

<num_players>_players-<game_name>-record.csv: Detailed scores for each game (game score, ToM1, ToM2, rounds).
<num_players>_players-<game_name>-summary.json: Game configuration and summary statistics (average score, std, highest/lowest scores, ToM scores, correlations).
ToM_record.json: ToM scores and rationales for each game.
<num_players>_players-<game_name>-log.json: Detailed model responses for each game (if --log is enabled).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-Hanabi: A Benchmark for Theory-of-Mind in Multi-Agent Collaboration

Overview

Key Features

Repository Structure

Installation

Environment Configurations

Outputs

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
img		img
Agents.py		Agents.py
HanabiEnv.py		HanabiEnv.py
README.md		README.md
ToM_eval.py		ToM_eval.py
call_api.py		call_api.py
hanabi.py		hanabi.py
players_base.yaml		players_base.yaml
players_groups.yaml		players_groups.yaml
requirements.txt		requirements.txt

HKUST-KnowComp/LLM-Hanabi

Folders and files

Latest commit

History

Repository files navigation

LLM-Hanabi: A Benchmark for Theory-of-Mind in Multi-Agent Collaboration

Overview

Key Features

Repository Structure

Installation

Environment Configurations

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages