This repository contains code for generating Minecraft maps using multiple generative approaches:
- Diffusion Models: A state-of-the-art generative model that gradually denoises random noise into coherent Minecraft structures.
- Variational Autoencoder (VAE): A neural network that encodes Minecraft maps into a latent space and decodes them back.
- Conditional Flow Matching (CFM): A generative model that learns to generate new Minecraft maps by modeling the flow in the latent space.
The project consists of the following components:
- Minecraft Dataset: A PyTorch dataset for loading Minecraft schematic files.
- Diffusion Model: A 3D UNet-based diffusion model for generating Minecraft structures.
- Variational Autoencoder (VAE): A neural network that encodes Minecraft maps into a latent space and decodes them back.
- Conditional Flow Matching (CFM): A generative model that learns to generate new Minecraft maps by modeling the flow in the latent space.
- Python 3.6+
- PyTorch
- NumPy
- Matplotlib
- tqdm
- nbtlib (for loading schematic files)
- scikit-learn (for t-SNE visualization)
Place your Minecraft schematic files in a directory (e.g., minecraft-schematics-raw
). The dataset will automatically load and process these files.
python main.py --batch_size 64 --learning_rate 1e-4 --diffusion_steps 1000 --noise_schedule linear
python sample.py --model_path models/model000100.pt --num_samples 10
python vae_demo.py --data_dir minecraft-schematics-raw --train --epochs 20 --save_model models/minecraft_vae.pth
python vae_demo.py --data_dir minecraft-schematics-raw --load_model models/minecraft_vae.pth --evaluate --visualize_latent --generate --interpolate
python flow_matching.py --data_dir minecraft-schematics-raw --vae_model models/minecraft_vae.pth --train --epochs 20 --save_flow_model models/minecraft_flow.pth
python flow_matching.py --data_dir minecraft-schematics-raw --vae_model models/minecraft_vae.pth --load_flow_model models/minecraft_flow.pth --generate --n_samples 10
The diffusion model is a 3D UNet that:
- Takes a 3D tensor of shape [batch_size, num_blocks, 16, 16, 16] as input
- Gradually denoises random noise into coherent Minecraft structures
- Uses a mask for conditioning on valid positions
- Employs a standard diffusion process with a noise schedule
The VAE consists of:
- An encoder that maps Minecraft maps to a latent space
- A decoder that reconstructs maps from the latent space
- A reparameterization trick for sampling from the latent space
The VAE takes into account the mask to handle variable-sized inputs.
The CFM learns to model the vector field of a continuous normalizing flow in the latent space. It can be conditioned on additional context vectors for controlled generation.
minecraft_dataset.py
: PyTorch dataset for Minecraft schematic filesschematic_loader.py
: Utility for loading schematic filesmain.py
: Main script for training the diffusion modelsample.py
: Script for sampling from the trained diffusion modelminecraft_vae.py
: Implementation of the Variational Autoencodervae_demo.py
: Demo script for the VAEflow_matching.py
: Implementation of the Conditional Flow Matching networktest_dataset.py
: Test script for the dataset
The diffusion model can generate samples by gradually denoising random noise:
# Sample from the diffusion model
samples = diffusion.p_sample_loop(
model,
(batch_size, num_blocks, 16, 16, 16),
clip_denoised=True,
model_kwargs={"mask": mask},
device="cuda",
)
The VAE can generate samples by sampling from the latent space and decoding:
# Sample from the VAE
samples = vae.sample(num_samples=5, device="cuda")
The Flow Matching network can generate samples by solving the ODE:
# Generate samples with the Flow Matching network
samples = generate_samples_with_flow(vae, flow_model, device="cuda", num_samples=5)
You can customize the models by adjusting the following parameters:
model_channels
: Number of channels in the UNet modelnum_res_blocks
: Number of residual blocks in each UNet layerattention_resolutions
: Resolutions at which to apply attentiondropout
: Dropout ratechannel_mult
: Channel multiplier for each UNet layerdiffusion_steps
: Number of diffusion stepsnoise_schedule
: Type of noise schedule ("linear" or "cosine")
latent_dim
: Dimension of the latent spaceembedding_dim
: Dimension of the block embeddingshidden_dims
: Dimensions of the hidden layers
context_dim
: Dimension of the context vector for conditional generationhidden_dims
: Dimensions of the hidden layers in the flow model
This project is licensed under the MIT License - see the LICENSE file for details.