Skip to content

tamaki-lab/M3DDM-Plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M3DDM+: An Improved Video Outpainting by a Modified Masking Strategy

Takuya Murakawa, Takumi Fukuzawa, Ning Ding, Toru Tamaki
Nagoya Institute of Technology

IWAIT 2026

arXiv Project Page Hugging Face

comparison.mp4

TODO

  • add results sample
  • add FVD metrics in evaluate code
  • add contact
  • add quickstart inference sample

Environment Setup

  1. Create and activate a virtual environment using Python 3.12 (or 3.10+):
python3.12 -m venv venv
source venv/bin/activate
  1. Install all dependencies from the requirements.txt file:
pip install -r requirements.txt

Note: Make sure you have Python 3.10 or later installed. Our testing environment uses Python 3.12.3 with PyTorch 2.8.0+cu128 and CUDA 13.1.

Troubleshooting

If you encounter the following error during setup:

ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Run the following command to fix it:

pip install huggingface-hub==0.25.2

Reference: Stack Overflow - ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Download Models

Before you can run the project, you need to download the following:

  1. Pre-trained Stable Diffusion Model Weights:

    We used the VAE encoder and decoder inside Stable Diffusion Model. To get the pre-trained stable diffusion v1.5 weights, download them from the following link:
    https://huggingface.co/runwayml/stable-diffusion-v1-5

  2. Video-Outpainting Model Checkpoints:

    To get pre-trained M3DDM-Plus model weights, download them from the Hugging Face repository.
    https://huggingface.co/MurakawaTakuya/M3DDM-Plus

Inference

You can run the inference code with the following command:

CUDA_VISIBLE_DEVICES=0 python src/inference.py \
  --input_video_path "path/to/input_video.mp4" \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_outpainting_model_dir "M3DDM-Plus" \
  --output_dir "path/to/output_directory" \
  --target_ratio_list "1:1" \
  --output_size 256

Parameters

  • video_outpainting_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as "???".
  • target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.

GPU memory

Inference requires approximately 13GB of VRAM and take ??? minutes for 256*256 resolution, ??? frames video on a single NVIDIA RTX 8000. (Increasing frames doesn't increase GPU memory usage.)
To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.

Training

You can run the training code with the following command:

CUDA_VISIBLE_DEVICES=1 python src/train.py \
  --data_dir "path/to/dataset/directory" \
  --size 128 \
  --epochs 5 \
  --lr 1e-5 \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_model_dir "M3DDM-Plus" \
  --gpus 1 \
  --output_dir "output" \
  --max_samples 10000 \
  --eval_video_dir "path/to/evaluation_video_directory" \
  --eval_crop_ratio 0.25 \
  --eval_crop_axis "horizontal" \
  --eval_target_ratio_list "16:9" \
  --limit_val_batches 1000

Parameters

  • data_dir: The directory where the training data is stored. The directory should contain /train and /val directories.
  • video_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as "???".
  • output_dir: The directory where the training results will be saved.
  • max_samples: The maximum number of samples to use for training.
  • eval_video_dir: The directory where the evaluation data is stored.
  • eval_crop_ratio: The ratio of the evaluation data to use for evaluation.
  • eval_crop_axis: The axis to use for cropping the evaluation data.
  • eval_target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.
  • limit_val_batches: The number of videos to use for validation.

Use --disable_validation to disable validation.

GPU memory

Training requires approximately 28GB of VRAM and takes ??? hours per epoch at 128x128 resolution with ??? steps per epoch on a single NVIDIA RTX 8000.
To reduce GPU memory usage, you can enable --enable_unet_gradient_checkpointing, which will reduce memory consumption at the cost of training speed.

Evaluation

CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
  --video_dir "path/to/data/directory" \
  --pretrained_sd_dir "stable-diffusion-v1-5" \
  --video_outpainting_model_dir "M3DDM-Plus" \
  --target_ratio_list "16:9" \
  --crop_ratio 0.25 \
  --crop_axis "horizontal" \
  --output_size 256 \
  --limit_outpainting_frames -1

Parameters

  • video_dir: The directory where the evaluation data is stored.
  • pretrained_sd_dir: The directory where the pre-trained stable diffusion model weights are stored.
  • video_outpainting_model_dir: The directory where the video-outpainting model weights are stored. If ??? is on root directory, set this parameter as "???".
  • target_ratio_list: The aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.
  • crop_ratio: The ratio of the evaluation data to use for evaluation.
  • crop_axis: The axis to use for cropping the evaluation data.
  • output_size: The size of the output video.
  • limit_outpainting_frames: The number of frames to use for outpainting. Use -1 to use all frames.

GPU memory

Evaluation requires the same amount of VRAM and time as inference, multiplied by the number of evaluation videos. To save GPU memory, you can use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. Also, using --enable_attention_slicing will reduce memory consumption at the cost of inference speed.

Logging

This project uses Comet ML for experiment tracking and logging.

Add --disable_comet (or -dc) to disable logging to Comet.

Comet Configuration

Comet configuration uses two configuration files:

  • ./.comet.config in this project directory
  • ~/.comet.config in your home directory

For more details, refer to the Comet configuration documentation.

Important: Do not write API keys directly in code.

Global Configuration (Home Directory)

Create ~/.comet.config with settings common to all your projects as follows:

[comet]
api_key=XXXXXHereIsYourAPIKeyXXXXXXXX
workspace=your_workspace_name

[comet_logging]
hide_api_key=True
  • Set your Comet API key and default workspace
  • Set hide_api_key=True to prevent API keys from appearing in logs

Project Configuration

Copy the example configuration file .comet.config.example and name it .comet.config:

cp .comet.config.example .comet.config

Then edit ./.comet.config with your data:

[comet]
workspace=your_workspace_name # Change to your workspace name (comet user name)
project_name=M3DDM-Plus # Change to your project name (e.g. M3DDM-Plus-Video-Outpainting)

[comet_logging]
file=comet_logs/comet_{project}_{datetime}.log # Change the path to your desired location (optional)
  • Settings here override those in ~/.comet.config

Citation

If our work is helpful, please help to ⭐ the repo.

Please consider citing our paper if you found our work interesting and useful.

@inproceedings{murakawa2026m3ddmplus,
  title={M3DDM+: An improved video outpainting by a modified masking strategy},
  author={Murakawa, Takuya and Fukuzawa, Takumi and Ding, Ning and Tamaki, Toru},
  booktitle={Proceedings of the International Workshop on Advanced Imaging Technology (IWAIT)},
  year={2026}
}

Contact us

Please feel free to reach out to us:

Acknowledgement

The inference and pipeline code is based on published code of M3DDM-Video-Outpainting. The training and evaluation code was reproduced based on the M3DDM paper as it isn't published, and modified for our proposed method.

About

[IWAIT 2026] Code of "M3DDM+: An improved video outpainting by a modified masking strategy"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages