Takuya Murakawa, Takumi Fukuzawa, Ning Ding, Toru Tamaki
Nagoya Institute of Technology
IWAIT 2026
comparison.mp4
- add results sample
- add FVD metrics in evaluate code
- add contact
- add quickstart inference sample
- Create and activate a virtual environment using Python 3.12 (or 3.10+):
python3.12 -m venv venv
source venv/bin/activate- Install all dependencies from the
requirements.txtfile:
pip install -r requirements.txtNote: Make sure you have Python 3.10 or later installed. Our testing environment uses Python 3.12.3 with PyTorch 2.8.0+cu128 and CUDA 13.1.
If you encounter the following error during setup:
ImportError: cannot import name 'cached_download' from 'huggingface_hub'
Run the following command to fix it:
pip install huggingface-hub==0.25.2Reference: Stack Overflow - ImportError: cannot import name 'cached_download' from 'huggingface_hub'
Before you can run the project, you need to download the following:
-
Pre-trained Stable Diffusion Model Weights:
We used the VAE encoder and decoder inside Stable Diffusion Model. To get the pre-trained stable diffusion v1.5 weights, download them from the following link:
https://huggingface.co/runwayml/stable-diffusion-v1-5 -
Video-Outpainting Model Checkpoints:
To get pre-trained M3DDM-Plus model weights, download them from the Hugging Face repository.
https://huggingface.co/MurakawaTakuya/M3DDM-Plus
You can run the inference code with the following command:
CUDA_VISIBLE_DEVICES=0 python src/inference.py \
--input_video_path "path/to/input_video.mp4" \
--pretrained_sd_dir "stable-diffusion-v1-5" \
--video_outpainting_model_dir "M3DDM-Plus" \
--output_dir "path/to/output_directory" \
--target_ratio_list "1:1" \
--output_size 256Parameters
video_outpainting_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as"???".target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.
Inference requires approximately 13GB of VRAM and take ??? minutes for 256*256 resolution, ??? frames video on a single NVIDIA RTX 8000. (Increasing frames doesn't increase GPU memory usage.)
To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.
Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.
You can run the training code with the following command:
CUDA_VISIBLE_DEVICES=1 python src/train.py \
--data_dir "path/to/dataset/directory" \
--size 128 \
--epochs 5 \
--lr 1e-5 \
--pretrained_sd_dir "stable-diffusion-v1-5" \
--video_model_dir "M3DDM-Plus" \
--gpus 1 \
--output_dir "output" \
--max_samples 10000 \
--eval_video_dir "path/to/evaluation_video_directory" \
--eval_crop_ratio 0.25 \
--eval_crop_axis "horizontal" \
--eval_target_ratio_list "16:9" \
--limit_val_batches 1000Parameters
data_dir: The directory where the training data is stored. The directory should contain/trainand/valdirectories.video_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as"???".output_dir: The directory where the training results will be saved.max_samples: The maximum number of samples to use for training.eval_video_dir: The directory where the evaluation data is stored.eval_crop_ratio: The ratio of the evaluation data to use for evaluation.eval_crop_axis: The axis to use for cropping the evaluation data.eval_target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.limit_val_batches: The number of videos to use for validation.
Use --disable_validation to disable validation.
Training requires approximately 28GB of VRAM and takes ??? hours per epoch at 128x128 resolution with ??? steps per epoch on a single NVIDIA RTX 8000.
To reduce GPU memory usage, you can enable --enable_unet_gradient_checkpointing, which will reduce memory consumption at the cost of training speed.
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
--video_dir "path/to/data/directory" \
--pretrained_sd_dir "stable-diffusion-v1-5" \
--video_outpainting_model_dir "M3DDM-Plus" \
--target_ratio_list "16:9" \
--crop_ratio 0.25 \
--crop_axis "horizontal" \
--output_size 256 \
--limit_outpainting_frames -1Parameters
video_dir: The directory where the evaluation data is stored.pretrained_sd_dir: The directory where the pre-trained stable diffusion model weights are stored.video_outpainting_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as"???".target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.crop_ratio: The ratio of the evaluation data to use for evaluation.crop_axis: The axis to use for cropping the evaluation data.output_size: The size of the output video.limit_outpainting_frames: The number of frames to use for outpainting. Use-1to use all frames.
Evaluation requires the same amount of VRAM and time as inference, multiplied by the number of evaluation videos.
To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.
Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.
This project uses Comet ML for experiment tracking and logging.
Add --disable_comet (or -dc) to disable logging to Comet.
Comet configuration uses two configuration files:
./.comet.configin this project directory~/.comet.configin your home directory
For more details, refer to the Comet configuration documentation.
Important: Do not write API keys directly in code.
Create ~/.comet.config with settings common to all your projects as follows:
[comet]
api_key=XXXXXHereIsYourAPIKeyXXXXXXXX
workspace=your_workspace_name
[comet_logging]
hide_api_key=True- Set your Comet API key and default workspace
- Set
hide_api_key=Trueto prevent API keys from appearing in logs
Copy the example configuration file .comet.config.example and name it .comet.config:
cp .comet.config.example .comet.configThen edit ./.comet.config with your data:
[comet]
workspace=your_workspace_name # Change to your workspace name (comet user name)
project_name=M3DDM-Plus # Change to your project name (e.g. M3DDM-Plus-Video-Outpainting)
[comet_logging]
file=comet_logs/comet_{project}_{datetime}.log # Change the path to your desired location (optional)- Settings here override those in
~/.comet.config
If our work is helpful, please help to ⭐ the repo.
Please consider citing our paper if you found our work interesting and useful.
@inproceedings{murakawa2026m3ddmplus,
title={M3DDM+: An improved video outpainting by a modified masking strategy},
author={Murakawa, Takuya and Fukuzawa, Takumi and Ding, Ning and Tamaki, Toru},
booktitle={Proceedings of the International Workshop on Advanced Imaging Technology (IWAIT)},
year={2026}
}Please feel free to reach out to us:
The inference and pipeline code is based on published code of M3DDM-Video-Outpainting. The training and evaluation code was reproduced based on the M3DDM paper as it isn't published, and modified for our proposed method.