📄 Paper (arXiv) • 🤗 StarWM Model • 📂 SC2-Dynamics-50K Dataset
This repository contains the official implementation of the paper World Models for Policy Refinement in StarCraft II.
We propose StarWM, the first action-conditioned world model for StarCraft II. Given the current observation and a sequence of actions, StarWM predicts structured future observations under partial observability. It is further integrated into a world-model-augmented decision system (StarWM-Agent) to enable short-horizon predictive simulation and inference-time policy refinement.
To address dynamics modeling and decision integration in this hybrid and partially observable environment, we:
- Introduce a structured textual observation representation that factorizes SC2 dynamics into five semantic modules
- Construct SC2-Dynamics-50K, the first instruction-tuning dataset for SC2 dynamics prediction, and train StarWM via supervised fine-tuning on Qwen3-8B
- Develop a multi-dimensional offline evaluation framework that measures world model quality across Economy, Development, Micro-Entity, Macro-Situation
- Propose StarWM-Agent, a Generate–Simulate–Refine decision system for inference-time policy refinement
- 🟢 60% reduction in minerals prediction error (SMAPE)
- 🟢 60% improvement in self-side macro-situation consistency (AWD)
- 🟢 Significant gains in task progress prediction and unit attributes modeling
StarWM-Agent augments an LLM policy with short-horizon predictive simulation:
- Generate initial actions
- Simulate predicted future using StarWM
- Refine actions conditioned on predicted future
This enables:
- Preemptive macro-management (including Supply bottleneck anticipation)
- Lightweight combat feasibility assessment
- 🟢 +30% / +15% / +30% win-rate gains against SC2's Hard / Harder / VeryHard built-in AI
- 🟢 Significant reduction in supply block rate
- 🟢 Higher resource conversion rate
- 🟢 Improved kill-loss ratio
First, clone this repository:
git clone https://github.com/yxzzhang/StarWM.git
cd StarWMCreate a new conda environment with Python 3.12 and install the required dependencies:
conda create -n StarWM python=3.12 -y
conda activate StarWM
pip install -r requirements.txtDownload the SC2-Dynamics-50K dataset from Hugging Face:
hf download --repo-type dataset yxzhang2024/SC2-Dynamics-50K --local-dir ./data/The dataset contains:
wm_train_horizon5.jsonwm_val_horizon5.jsonwm_test_horizon5.json
Download the StarWM model from Hugging Face:
hf download yxzhang2024/StarWM --local-dir path/to/your/local/dirYou can deploy this model using vLLM for inference.
If you don't download the StarWM model and want to train from scratch, we provide a tutorial here, using LLaMA-Factory for training.
You need to copy the SC2-Dynamics-50K dataset to the LLaMA-Factory data directory.
# Create the directory
mkdir -p train/LLaMA-Factory/data/SC2-Dynamics-50K
# Copy the files
cp data/*.json train/LLaMA-Factory/data/SC2-Dynamics-50K/After preparation, the directory structure should look like this:
StarWM/
├── data/
│ ├── wm_train_horizon5.json
│ ├── ...
├── train/
│ └── LLaMA-Factory/
│ └── data/
│ └── SC2-Dynamics-50K/
│ ├── wm_train_horizon5.json
│ ├── ...
Modify train/train_starwm.sh to set your model_name_or_path and output_dir.
Navigate to the LLaMA-Factory directory to run the training script:
cd train/LLaMA-Factory
bash ../train_starwm.shAfter training, you need to merge the LoRA weights.
Modify examples/merge_lora/qwen3_lora_sft.yaml with your paths, then run:
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yamlThen deploy this model using vLLM for inference.
Inference scripts are located in offline_infer/.
- Deploy your merged model or the StarWM using vLLM (compatible with OpenAI API).
- Edit
offline_infer/run_experiment_wm.shto set:API_BASE: Your vLLM server address (e.g.,http://localhost:8000/v1)API_KEY: Your API key (if any, elseEMPTY)MODEL_NAME: The model name served by vLLM
Run the inference:
# Run from the project root
bash offline_infer/run_experiment_wm.shEvaluation scripts are in offline_evaluate/.
Edit offline_evaluate/run_eval.sh (lines 3-9) to point to your inference output files, then run:
bash offline_evaluate/run_eval.shTo replicate the results from the paper (Figure 3-5, Table 1, Table 4), you can use the specific commands provided below.
Reproduce Figure 3-5 (Offline Evolution of AWD & Case Study):
python offline_evaluate/run_eval.py \
--wm_file offline_infer/starwm_output_1traj/WM-final_nothink_results.jsonl \
--zeroshot_32b_file offline_infer/starwm_output_1traj/Qwen3-32B_nothink_results.jsonl \
--zeroshot_8b_file offline_infer/starwm_output_1traj/Qwen3-8B_nothink_results.jsonl \
--output_dir offline_evaluate/starwm_results \
--plot \
--figure2_frames 384 365Reproduce Table 1,4 (Offline evaluation results):
python offline_evaluate/run_eval.py \
--wm_file offline_infer/starwm_output/WM-final_nothink_results.jsonl \
--zeroshot_32b_file offline_infer/starwm_output/Qwen3-32B_nothink_results.jsonl \
--zeroshot_8b_file offline_infer/starwm_output/Qwen3-8B_nothink_results.jsonl \
--static_bias_file offline_infer/starwm_output/static_bias_results.jsonl \
--output_dir offline_evaluate/starwm_resultsThe online testing component is built on the SC2Arena platform. As the SC2Arena codebase has not yet been publicly released, we are currently seeking permission from the SC2Arena authors for open-sourcing the related code and will release it once approval is granted.
If you find this work useful, please cite our paper:
@misc{zhang2026worldmodels,
title={World Models for Policy Refinement in StarCraft II},
author={Yixin Zhang and Ziyi Wang and Yiming Rong and Haoxi Wang and Jinling Jiang and Shuang Xu and Haoran Wu and Shiyu Zhou and Bo Xu},
year={2026},
eprint={2602.14857},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.14857},
}

