World Models for Policy Refinement in StarCraft II

📄 Paper (arXiv) • 🤗 StarWM Model • 📂 SC2-Dynamics-50K Dataset

This repository contains the official implementation of the paper World Models for Policy Refinement in StarCraft II.

📖 Table of Contents

Introduction
Installation
Datasets & Models
Quick Start
Citation

📝 Introduction

We propose StarWM, the first action-conditioned world model for StarCraft II. Given the current observation and a sequence of actions, StarWM predicts structured future observations under partial observability. It is further integrated into a world-model-augmented decision system (StarWM-Agent) to enable short-horizon predictive simulation and inference-time policy refinement.

To address dynamics modeling and decision integration in this hybrid and partially observable environment, we:

Introduce a structured textual observation representation that factorizes SC2 dynamics into five semantic modules
Construct SC2-Dynamics-50K, the first instruction-tuning dataset for SC2 dynamics prediction, and train StarWM via supervised fine-tuning on Qwen3-8B
Develop a multi-dimensional offline evaluation framework that measures world model quality across Economy, Development, Micro-Entity, Macro-Situation
Propose StarWM-Agent, a Generate–Simulate–Refine decision system for inference-time policy refinement

🕹️ Offline Evaluation Results

🟢 60% reduction in minerals prediction error (SMAPE)
🟢 60% improvement in self-side macro-situation consistency (AWD)
🟢 Significant gains in task progress prediction and unit attributes modeling

🎯 StarWM-Agent: Generate–Simulate–Refine

StarWM-Agent augments an LLM policy with short-horizon predictive simulation:

Generate initial actions
Simulate predicted future using StarWM
Refine actions conditioned on predicted future

This enables:

Preemptive macro-management (including Supply bottleneck anticipation)
Lightweight combat feasibility assessment

🕹️ Online Decision-Making Performance

🟢 +30% / +15% / +30% win-rate gains against SC2's Hard / Harder / VeryHard built-in AI
🟢 Significant reduction in supply block rate
🟢 Higher resource conversion rate
🟢 Improved kill-loss ratio

🛠️ Installation

First, clone this repository:

git clone https://github.com/yxzzhang/StarWM.git
cd StarWM

Create a new conda environment with Python 3.12 and install the required dependencies:

conda create -n StarWM python=3.12 -y
conda activate StarWM
pip install -r requirements.txt

📊 Datasets & Models

Download the SC2-Dynamics-50K dataset from Hugging Face:

hf download --repo-type dataset yxzhang2024/SC2-Dynamics-50K --local-dir ./data/

The dataset contains:

wm_train_horizon5.json
wm_val_horizon5.json
wm_test_horizon5.json

Download the StarWM model from Hugging Face:

hf download yxzhang2024/StarWM --local-dir path/to/your/local/dir

You can deploy this model using vLLM for inference.

🚀 Quick Start

🤖 (Optional) World Model Training

If you don't download the StarWM model and want to train from scratch, we provide a tutorial here, using LLaMA-Factory for training.

You need to copy the SC2-Dynamics-50K dataset to the LLaMA-Factory data directory.

# Create the directory
mkdir -p train/LLaMA-Factory/data/SC2-Dynamics-50K

# Copy the files
cp data/*.json train/LLaMA-Factory/data/SC2-Dynamics-50K/

After preparation, the directory structure should look like this:

StarWM/
├── data/
│   ├── wm_train_horizon5.json
│   ├── ...
├── train/
│   └── LLaMA-Factory/
│       └── data/
│           └── SC2-Dynamics-50K/
│               ├── wm_train_horizon5.json
│               ├── ...

1. Training

Modify train/train_starwm.sh to set your model_name_or_path and output_dir.

Navigate to the LLaMA-Factory directory to run the training script:

cd train/LLaMA-Factory
bash ../train_starwm.sh

2. Merge LoRA Weights

After training, you need to merge the LoRA weights. Modify examples/merge_lora/qwen3_lora_sft.yaml with your paths, then run:

llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml

Then deploy this model using vLLM for inference.

🔮 Offline Inference

Inference scripts are located in offline_infer/.

Deploy your merged model or the StarWM using vLLM (compatible with OpenAI API).
Edit offline_infer/run_experiment_wm.sh to set:
- API_BASE: Your vLLM server address (e.g., http://localhost:8000/v1)
- API_KEY: Your API key (if any, else EMPTY)
- MODEL_NAME: The model name served by vLLM

Run the inference:

# Run from the project root
bash offline_infer/run_experiment_wm.sh

🧾 Offline Evaluation

Evaluation scripts are in offline_evaluate/.

General Usage

Edit offline_evaluate/run_eval.sh (lines 3-9) to point to your inference output files, then run:

bash offline_evaluate/run_eval.sh

Paper Replication

To replicate the results from the paper (Figure 3-5, Table 1, Table 4), you can use the specific commands provided below.

Reproduce Figure 3-5 (Offline Evolution of AWD & Case Study):

python offline_evaluate/run_eval.py \
    --wm_file offline_infer/starwm_output_1traj/WM-final_nothink_results.jsonl \
    --zeroshot_32b_file offline_infer/starwm_output_1traj/Qwen3-32B_nothink_results.jsonl \
    --zeroshot_8b_file offline_infer/starwm_output_1traj/Qwen3-8B_nothink_results.jsonl \
    --output_dir offline_evaluate/starwm_results \
    --plot \
    --figure2_frames 384 365

Reproduce Table 1,4 (Offline evaluation results):

python offline_evaluate/run_eval.py \
    --wm_file offline_infer/starwm_output/WM-final_nothink_results.jsonl \
    --zeroshot_32b_file offline_infer/starwm_output/Qwen3-32B_nothink_results.jsonl \
    --zeroshot_8b_file offline_infer/starwm_output/Qwen3-8B_nothink_results.jsonl \
    --static_bias_file offline_infer/starwm_output/static_bias_results.jsonl \
    --output_dir offline_evaluate/starwm_results

🎮 Online Testing

The online testing component is built on the SC2Arena platform. As the SC2Arena codebase has not yet been publicly released, we are currently seeking permission from the SC2Arena authors for open-sourcing the related code and will release it once approval is granted.

📚 Citation

If you find this work useful, please cite our paper:

@misc{zhang2026worldmodels,
      title={World Models for Policy Refinement in StarCraft II}, 
      author={Yixin Zhang and Ziyi Wang and Yiming Rong and Haoxi Wang and Jinling Jiang and Shuang Xu and Haoran Wu and Shiyu Zhou and Bo Xu},
      year={2026},
      eprint={2602.14857},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.14857}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Models for Policy Refinement in StarCraft II

📖 Table of Contents

📝 Introduction

🕹️ Offline Evaluation Results

🎯 StarWM-Agent: Generate–Simulate–Refine

🕹️ Online Decision-Making Performance

🛠️ Installation

📊 Datasets & Models

🚀 Quick Start

🤖 (Optional) World Model Training

1. Training

2. Merge LoRA Weights

🔮 Offline Inference

🧾 Offline Evaluation

General Usage

Paper Replication

🎮 Online Testing

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data		data
offline_evaluate		offline_evaluate
offline_infer		offline_infer
train		train
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

World Models for Policy Refinement in StarCraft II

📖 Table of Contents

📝 Introduction

🕹️ Offline Evaluation Results

🎯 StarWM-Agent: Generate–Simulate–Refine

🕹️ Online Decision-Making Performance

🛠️ Installation

📊 Datasets & Models

🚀 Quick Start

🤖 (Optional) World Model Training

1. Training

2. Merge LoRA Weights

🔮 Offline Inference

🧾 Offline Evaluation

General Usage

Paper Replication

🎮 Online Testing

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages