|  | 
|  | 1 | +<!--Copyright 2025 The HuggingFace Team. All rights reserved. | 
|  | 2 | +# | 
|  | 3 | +# Licensed under the Apache License, Version 2.0 (the "License"); | 
|  | 4 | +# you may not use this file except in compliance with the License. | 
|  | 5 | +# You may obtain a copy of the License at | 
|  | 6 | +# | 
|  | 7 | +#     http://www.apache.org/licenses/LICENSE-2.0 | 
|  | 8 | +# | 
|  | 9 | +# Unless required by applicable law or agreed to in writing, software | 
|  | 10 | +# distributed under the License is distributed on an "AS IS" BASIS, | 
|  | 11 | +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | 
|  | 12 | +# See the License for the specific language governing permissions and | 
|  | 13 | +# limitations under the License. | 
|  | 14 | +--> | 
|  | 15 | + | 
|  | 16 | +# EasyAnimate | 
|  | 17 | +[EasyAnimate](https://github.com/aigc-apps/EasyAnimate) by Alibaba PAI. | 
|  | 18 | + | 
|  | 19 | +The description from it's GitHub page: | 
|  | 20 | +*EasyAnimate is a pipeline based on the transformer architecture, designed for generating AI images and videos, and for training baseline models and Lora models for Diffusion Transformer. We support direct prediction from pre-trained EasyAnimate models, allowing for the generation of videos with various resolutions, approximately 6 seconds in length, at 8fps (EasyAnimateV5.1, 1 to 49 frames). Additionally, users can train their own baseline and Lora models for specific style transformations.* | 
|  | 21 | + | 
|  | 22 | +This pipeline was contributed by [bubbliiiing](https://github.com/bubbliiiing). The original codebase can be found [here](https://huggingface.co/alibaba-pai). The original weights can be found under [hf.co/alibaba-pai](https://huggingface.co/alibaba-pai). | 
|  | 23 | + | 
|  | 24 | +There are two official EasyAnimate checkpoints for text-to-video and video-to-video. | 
|  | 25 | + | 
|  | 26 | +| checkpoints | recommended inference dtype | | 
|  | 27 | +|:---:|:---:| | 
|  | 28 | +| [`alibaba-pai/EasyAnimateV5.1-12b-zh`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh) | torch.float16 | | 
|  | 29 | +| [`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-InP) | torch.float16 | | 
|  | 30 | + | 
|  | 31 | +There is one official EasyAnimate checkpoints available for image-to-video and video-to-video. | 
|  | 32 | + | 
|  | 33 | +| checkpoints | recommended inference dtype | | 
|  | 34 | +|:---:|:---:| | 
|  | 35 | +| [`alibaba-pai/EasyAnimateV5.1-12b-zh-InP`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-InP) | torch.float16 | | 
|  | 36 | + | 
|  | 37 | +There are two official EasyAnimate checkpoints available for control-to-video. | 
|  | 38 | + | 
|  | 39 | +| checkpoints | recommended inference dtype | | 
|  | 40 | +|:---:|:---:| | 
|  | 41 | +| [`alibaba-pai/EasyAnimateV5.1-12b-zh-Control`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control) | torch.float16 | | 
|  | 42 | +| [`alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera`](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera) | torch.float16 | | 
|  | 43 | + | 
|  | 44 | +For the EasyAnimateV5.1 series: | 
|  | 45 | +- Text-to-video (T2V) and Image-to-video (I2V) works for multiple resolutions. The width and height can vary from 256 to 1024. | 
|  | 46 | +- Both T2V and I2V models support generation with 1~49 frames and work best at this value. Exporting videos at 8 FPS is recommended. | 
|  | 47 | + | 
|  | 48 | +## Quantization | 
|  | 49 | + | 
|  | 50 | +Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model. | 
|  | 51 | + | 
|  | 52 | +Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`EasyAnimatePipeline`] for inference with bitsandbytes. | 
|  | 53 | + | 
|  | 54 | +```py | 
|  | 55 | +import torch | 
|  | 56 | +from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, EasyAnimateTransformer3DModel, EasyAnimatePipeline | 
|  | 57 | +from diffusers.utils import export_to_video | 
|  | 58 | + | 
|  | 59 | +quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True) | 
|  | 60 | +transformer_8bit = EasyAnimateTransformer3DModel.from_pretrained( | 
|  | 61 | +    "alibaba-pai/EasyAnimateV5.1-12b-zh", | 
|  | 62 | +    subfolder="transformer", | 
|  | 63 | +    quantization_config=quant_config, | 
|  | 64 | +    torch_dtype=torch.float16, | 
|  | 65 | +) | 
|  | 66 | + | 
|  | 67 | +pipeline = EasyAnimatePipeline.from_pretrained( | 
|  | 68 | +    "alibaba-pai/EasyAnimateV5.1-12b-zh", | 
|  | 69 | +    transformer=transformer_8bit, | 
|  | 70 | +    torch_dtype=torch.float16, | 
|  | 71 | +    device_map="balanced", | 
|  | 72 | +) | 
|  | 73 | + | 
|  | 74 | +prompt = "A cat walks on the grass, realistic style." | 
|  | 75 | +negative_prompt = "bad detailed" | 
|  | 76 | +video = pipeline(prompt=prompt, negative_prompt=negative_prompt, num_frames=49, num_inference_steps=30).frames[0] | 
|  | 77 | +export_to_video(video, "cat.mp4", fps=8) | 
|  | 78 | +``` | 
|  | 79 | + | 
|  | 80 | +## EasyAnimatePipeline | 
|  | 81 | + | 
|  | 82 | +[[autodoc]] EasyAnimatePipeline | 
|  | 83 | +  - all | 
|  | 84 | +  - __call__ | 
|  | 85 | + | 
|  | 86 | +## EasyAnimatePipelineOutput | 
|  | 87 | + | 
|  | 88 | +[[autodoc]] pipelines.easyanimate.pipeline_output.EasyAnimatePipelineOutput | 
0 commit comments