Skip to content

HighDoping/Wan2.2-Mac

 
 

Repository files navigation

Wan2.2 Text-to-Video Model for Mac M-Series

WORK IN PROGRESS

This repository has the sole purpose of making Wan2.2 run efficiently on Mac M-Series chips. Memory saving is the priority.

Original problems

Mac M-Series chips have Unified Memory, so the original method of offloading models to the CPU still costs memory.

The original repo also loads all models at startup, which takes a lot of memory. (umt5-xxl is a 13B model!!)

Changes

  • Load models only when needed. (T5, base model, and vae)
  • Modify the offload_model method to delete the model from memory immediately after use.
  • Add quantized T5 model to reduce memory usage.
  • Use bf16 precision to reduce memory usage.

Problems

  • VAE tiling for VAE2.2 is broken now.

TODO

  • Fix VAE tiling for VAE2.2.
  • Add support for A14B model.

Installation

Follow the upstream instructions to install the dependencies and download the model.

Assuming you have Poetry installed, you can also install the dependencies with:

poetry install

If you want to use hugingface-cli or modelscope, you can install with:

poetry install --extras dev

Download the model with huggingface-cli or modelscope:

huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B
huggingface-cli download Wan-AI/Wan2.2-I2V-A14B --local-dir ./Wan2.2-I2V-A14B
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./Wan2.2-TI2V-5B
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B
modelscope download Wan-AI/Wan2.2-I2V-A14B --local_dir ./Wan2.2-I2V-A14B
modelscope download Wan-AI/Wan2.2-TI2V-5B --local_dir ./Wan2.2-TI2V-5B

To use quantized T5 model, download it from my 🤗 repo or use huggingface-cli and put it in the same folder as wan model:

huggingface-cli download HighDoping/umt5-xxl-encode-gguf --local-dir ./Wan2.2-TI2V-5B

Models from city96 also works. Only needs to change the model name in wan\configs

Then install llama.cpp from homebrew:

brew install llama.cpp

Usage

Text to video with Wan2.2-TI2V-5B

To generate a video, use the following command:

export PYTORCH_ENABLE_MPS_FALLBACK=1
python generate.py --task ti2v-5B --size "1280*704" --frame_num 41 --ckpt_dir ./Wan2.2-TI2V-5B --offload_model True --convert_model_dtype --t5_quant --device mps --prompt "Penguins fighting a polar bear in the arctic." --save_file output_video.mp4

--t5_quant enables the quantized T5 model.

For 32GB M4 Mac Mini, time taken: 1h37m. Result: TI2V_T2V

Image to video with Wan2.2-TI2V-5B

export PYTORCH_ENABLE_MPS_FALLBACK=1
python generate.py --task ti2v-5B --size "1280*704" --frame_num 25 --ckpt_dir ./Wan2.2-TI2V-5B --offload_model True --convert_model_dtype --t5_quant --device mps --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."

For 32GB M4 Mac Mini, time taken: 47m. Result: TI2V_I2V

About

Wan2.2 for Mac.

Resources

License

Stars

Watchers

Forks

Languages

  • Python 98.3%
  • Shell 1.7%