Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
dabb12f
first draft
tolgacangoz Jun 13, 2025
89806ea
style
tolgacangoz Jun 13, 2025
f4b5748
upp
tolgacangoz Jun 13, 2025
9b45317
style
tolgacangoz Jun 13, 2025
8e5881b
Merge branch 'main' into add-magi-1
tolgacangoz Jun 14, 2025
03d50e2
2nd draft
tolgacangoz Jun 14, 2025
08287a9
2nd draft
tolgacangoz Jun 14, 2025
8784881
up
tolgacangoz Jun 14, 2025
ae03b7d
Refactor Magi1AttentionBlock to support rotary embeddings and integra…
tolgacangoz Jun 15, 2025
2a2df39
Enhance rotary positional embeddings with new parameters for grid cen…
tolgacangoz Jun 24, 2025
61e7cb0
Merge branch 'main' into add-magi-1
tolgacangoz Jun 24, 2025
9f63582
Refactor Magi1 VAE to align with DiT architecture
tolgacangoz Jun 24, 2025
743bd44
Refactor: Remove custom caching mechanism from Magi1 VAE
tolgacangoz Jun 25, 2025
0f09f74
Refactor Magi1 VAE decoder logic
tolgacangoz Jun 25, 2025
d3df80a
Refactor Magi1 VAE decoder to a patch-based architecture
tolgacangoz Jun 26, 2025
3fcd4c3
Refactor Magi1 VAE block implementation
tolgacangoz Jun 26, 2025
1301c9e
Refactor Magi1 VAE blocks to use standard attention processor
tolgacangoz Jun 26, 2025
16218e8
Refactor: Simplify Magi1 VAE decoder architecture
tolgacangoz Jun 26, 2025
1537b5b
Refactor: Convert Magi1 encoder to a Vision Transformer architecture
tolgacangoz Jun 26, 2025
7603067
Refactor: Simplify and streamline Magi1 VAE architecture
tolgacangoz Jun 27, 2025
1898e19
style
tolgacangoz Jun 27, 2025
6e6ba3e
Refactor Magi1 VAE configuration and parameters
tolgacangoz Jun 27, 2025
9a4b252
Refactor Magi1 VAE to remove quantization steps
tolgacangoz Jun 27, 2025
499111d
Refactor MAGI1 VAE conversion for ViT architecture
tolgacangoz Jun 27, 2025
d5f5594
Rename `AutoencoderKLMagi` to `AutoencoderKLMagi1`
tolgacangoz Jun 27, 2025
0cb50c9
Refactor: Rename Magi to Magi1
tolgacangoz Jun 27, 2025
af5b575
style
tolgacangoz Jun 27, 2025
7a4af97
Merge branch 'main' into add-magi-1
tolgacangoz Jun 27, 2025
b5e140b
Refactor: Update references from `MagiPipeline` to `Magi1Pipeline` ac…
tolgacangoz Jun 27, 2025
eead329
Enhance Magi-1 checkpoint loading robustness
tolgacangoz Jun 28, 2025
1643342
Fixes tensor shape in MAGI-1 attention processor
tolgacangoz Jun 28, 2025
14dff1f
Refactor: Simplify VAE checkpoint conversion and integrate hf_hub_dow…
tolgacangoz Jun 28, 2025
069f510
Refactor: Remove convert_magi_checkpoint function and streamline VAE …
tolgacangoz Jun 28, 2025
85729e0
Add Magi1 models and pipelines to the module initialization
tolgacangoz Jun 28, 2025
9389d0b
Fix: Update Magi pipeline names to include versioning
tolgacangoz Jun 28, 2025
657f569
Refactor: Rename references to autoencoder_kl_magi to autoencoder_kl_…
tolgacangoz Jun 28, 2025
b12796b
renaming
tolgacangoz Jun 28, 2025
dedea6f
Refactor: Update references to Magi pipelines and classes to include …
tolgacangoz Jun 28, 2025
fc99d53
Refactor: Comment out unused imports related to Magi1LoraLoaderMixin …
tolgacangoz Jun 28, 2025
5616238
Refactor Magi1 encoder to support variational encoding
tolgacangoz Jun 28, 2025
6d17954
Refactor: Update text encoder and tokenizer initialization in the mai…
tolgacangoz Jun 28, 2025
f856187
Refactor: Update text encoder and tokenizer to use DeepFloyd model
tolgacangoz Jun 28, 2025
dc9bb61
Refactor: Update MAGI-1 transformer conversion script and related com…
tolgacangoz Jun 28, 2025
6027704
Refactor MAGI-1 conversion script for accurate loading
tolgacangoz Jun 28, 2025
58dc666
style
tolgacangoz Jun 28, 2025
87299a4
fix-copies
tolgacangoz Jun 28, 2025
017cfc3
Refactor: Remove einops dependency in Magi1 VAE
tolgacangoz Jun 28, 2025
ecece86
style
tolgacangoz Jun 28, 2025
33b6a65
style
tolgacangoz Jun 28, 2025
e725461
Rename autoencoder_kl_magi.md to autoencoder_kl_magi1.md
tolgacangoz Jul 4, 2025
7415473
Refactor: Rename MagiTransformer classes to Magi1Transformer for cons…
tolgacangoz Jul 4, 2025
2d29f94
Refactor: Rename Magi1AttnProcessor2_0 and Magi1TransformerBlock clas…
tolgacangoz Jul 5, 2025
b5f58aa
Merge branch 'main' into add-magi-1
tolgacangoz Jul 5, 2025
85b2b74
Improve Magi1 VAE to handle variable input resolutions
tolgacangoz Jul 6, 2025
d43d6dd
Refactor: Comment out _keep_in_fp32_modules and remove clamping in Au…
tolgacangoz Jul 6, 2025
e1c548b
style
tolgacangoz Jul 6, 2025
03a4b4c
Refactor: Replace `FP32LayerNorm` with a manual implementation
tolgacangoz Jul 6, 2025
1bfb06e
Removes residual connection in VAE attention processor
tolgacangoz Jul 6, 2025
a6e18e6
up
tolgacangoz Jul 6, 2025
c04df0f
tolgacangoz Jul 6, 2025
04c0b09
up
tolgacangoz Jul 6, 2025
a75997c
style
tolgacangoz Jul 6, 2025
d48c5f6
Refactor attention processing to improve tensor shape handling in Mag…
tolgacangoz Jul 7, 2025
72f97be
Add Magi1VAELayerNorm class for improved integration in Magi1VAEAttnP…
tolgacangoz Jul 7, 2025
ba9d3ff
Refactor Magi1VAEAttnProcessor2_0 to improve query, key, and value ha…
tolgacangoz Jul 7, 2025
535c0dc
Refactor: Remove `timm` dependency in Magi1 VAE
tolgacangoz Jul 7, 2025
0b0f1c5
style
tolgacangoz Jul 7, 2025
3ccd666
Refactor MAGI-1 conversion script and update model
tolgacangoz Jul 7, 2025
4bbb797
Refactor: Remove redundant weight and bias assignments in transformer…
tolgacangoz Jul 7, 2025
db8ff3e
Feat: Add dedicated caption embedder for MAGI-1
tolgacangoz Jul 7, 2025
c8ed5bf
Merge branch 'main' into add-magi-1
tolgacangoz Jul 10, 2025
ce8838c
Refactor Magi1 embedding and remove CFG logic
tolgacangoz Jul 10, 2025
eeeac90
Apply tanh softcap to time embedding
tolgacangoz Jul 10, 2025
457a28d
Enhance Magi1TimeTextCaptionEmbedding by using get_parameter_dtype fo…
tolgacangoz Jul 12, 2025
2874c64
Refactor Magi-1 text conditioning and output normalization
tolgacangoz Jul 12, 2025
a284406
style
tolgacangoz Jul 13, 2025
2f9a79d
Refactor Magi1Transformer3DModel output block and cleanup
tolgacangoz Jul 13, 2025
00d174f
up
tolgacangoz Jul 14, 2025
ae4f3dd
Merge branch 'main' into add-magi-1
tolgacangoz Jul 17, 2025
6aaf2bb
Merge branch 'main' into add-magi-1
tolgacangoz Jul 24, 2025
dcf2873
Simplifies patch embedding in Magi1
tolgacangoz Jul 24, 2025
22fe543
Implements text projection for Magi1
tolgacangoz Jul 24, 2025
4ce8133
Merge branch 'main' into add-magi-1
tolgacangoz Aug 7, 2025
2fd296d
Implement tiled encode
kuantuna Aug 9, 2025
6a37e3b
Refactor variable names
kuantuna Aug 9, 2025
1a68a98
Reintroducing deleted comments
kuantuna Aug 9, 2025
10a8435
Better naming
kuantuna Aug 9, 2025
f1288e1
Implement tiled decode
kuantuna Aug 10, 2025
9a962af
Add the temporal if check for tiled decode
kuantuna Aug 10, 2025
3b07597
Remove extra blank line
kuantuna Aug 10, 2025
166423d
Fix typo
kuantuna Aug 10, 2025
c7ff91e
Refactor magic number
kuantuna Aug 10, 2025
e5309a8
Remove redundant expand
kuantuna Aug 10, 2025
4d535d6
Fix slicing bug
kuantuna Aug 10, 2025
0695889
Resolve global tile order bug
kuantuna Aug 10, 2025
2522bc3
Add comment to first frame slicing
kuantuna Aug 10, 2025
b792e0e
Remove latentC calc
kuantuna Aug 10, 2025
f874853
Merge branch 'main' into add-magi-1
tolgacangoz Aug 11, 2025
14a77c9
Refactors Magi1 VAE attention processor
tolgacangoz Aug 11, 2025
7e7982b
style
tolgacangoz Aug 11, 2025
1600a44
Refactor VAE state dict handling and enhance Magi1 attention processor
tolgacangoz Aug 11, 2025
97437b8
style
tolgacangoz Aug 11, 2025
5ff8149
Refactors MAGI-1 attention mechanism
tolgacangoz Aug 13, 2025
0885025
Refactor Magi1 attention for optimization and clarity
tolgacangoz Aug 13, 2025
fc94ab1
Merge branch 'add-magi-1' into tuna/feat/implement-vae-tiling
tolgacangoz Aug 14, 2025
079be57
up
tolgacangoz Aug 16, 2025
f0707d6
style
tolgacangoz Aug 16, 2025
44cfaff
Merge branch 'main' into add-magi-1
tolgacangoz Aug 16, 2025
8f3798c
up attn
tolgacangoz Aug 20, 2025
05dbc6e
Refactor rotary embedding and simplify transformer block
tolgacangoz Aug 20, 2025
b9b070f
Merge branch 'main' into add-magi-1
tolgacangoz Aug 22, 2025
0badb18
up
tolgacangoz Aug 22, 2025
71fa981
simplify
tolgacangoz Aug 22, 2025
7666d3c
Refactor Magi1 attention for numerical stability
tolgacangoz Aug 22, 2025
0730ef5
up
tolgacangoz Aug 22, 2025
a56fb9b
Merge branch 'add-magi-1' into tuna/feat/implement-vae-tiling
tolgacangoz Aug 23, 2025
9e756bd
Merge pull request #6 from kuantuna/tuna/feat/implement-vae-tiling
tolgacangoz Aug 23, 2025
bdb08c4
Removes copied from...
tolgacangoz Aug 23, 2025
a0191ae
Merge branch 'main' into add-magi-1
tolgacangoz Aug 24, 2025
98b836e
Refactor MAGI-1 conversion script and update state dict
tolgacangoz Aug 25, 2025
f2ceae7
Refactor Magi-1 transformer to align with original model
tolgacangoz Aug 25, 2025
9661d25
style
tolgacangoz Aug 25, 2025
a3b166c
up
tolgacangoz Aug 25, 2025
f85df6d
Removes the CLIPVisionModel image encoder from the Magi1 pipeline and…
tolgacangoz Aug 25, 2025
33c1b68
up
tolgacangoz Aug 25, 2025
da339ba
Merge branch 'main' into add-magi-1
tolgacangoz Aug 26, 2025
b069582
Refactor Magi-1 rotary position embeddings
tolgacangoz Aug 27, 2025
55140dc
style
tolgacangoz Aug 27, 2025
e0d0602
Refactor rotary position embedding implementation
tolgacangoz Aug 28, 2025
69c066b
Refactor: Remove image conditioning from MAGI-1 model cuz it insert i…
tolgacangoz Aug 28, 2025
dc6b017
Refactor Magi-1 transformer forward pass
tolgacangoz Aug 28, 2025
bdc14b5
simplify
tolgacangoz Aug 29, 2025
73ba2c7
Merge branch 'main' into add-magi-1
tolgacangoz Sep 15, 2025
98cdfb5
Begin to propose adding support for the `MagiAttention` backend to th…
tolgacangoz Sep 15, 2025
d03d4bc
style
tolgacangoz Sep 15, 2025
b6d973e
Merge branch 'main' into add-magi-1
tolgacangoz Sep 24, 2025
98e92fc
Add support for `parallel_config` in the attention processor
tolgacangoz Sep 24, 2025
92cbe8f
Up for the `MagiAttention` backend.
tolgacangoz Oct 2, 2025
041a7e1
Merge branch 'main' into add-magi-1
tolgacangoz Oct 3, 2025
8fb783e
Merge branch 'main' into add-magi-1
tolgacangoz Oct 18, 2025
a619b6b
[test] add unit tests for `AutoencoderKLMagi1` model
tolgacangoz Oct 18, 2025
a730a66
[test] remove obsolete tests for `AutoencoderKLMagi1` model
tolgacangoz Oct 18, 2025
de2277e
up FeedForward import
tolgacangoz Oct 18, 2025
c9bbb2d
up tr
tolgacangoz Oct 19, 2025
cb4d92e
up template for pipet2v
tolgacangoz Oct 19, 2025
65b59e3
up tr
tolgacangoz Oct 19, 2025
c94d554
up pipet2v
tolgacangoz Oct 19, 2025
8d7ab21
style
tolgacangoz Oct 19, 2025
454b26d
up
tolgacangoz Oct 19, 2025
91e0bdf
up i2v
tolgacangoz Oct 19, 2025
0370762
upp
tolgacangoz Oct 19, 2025
1f36d30
Enhance MAGI-1 documentation and improve Video-to-Video pipeline with…
tolgacangoz Oct 19, 2025
8fff46a
up pipes
tolgacangoz Oct 19, 2025
1286419
style
tolgacangoz Oct 19, 2025
b99c0ea
Merge branch 'main' into add-magi-1
tolgacangoz Oct 21, 2025
8f94ae8
up
tolgacangoz Oct 21, 2025
f715c22
Enhance MAGI-1 transformer checkpoint conversion with detailed verifi…
tolgacangoz Oct 22, 2025
e15e1de
Implement MAGI variable-length attention backend and update MAGI-1 pi…
tolgacangoz Oct 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,8 @@
title: Lumina2Transformer2DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/magi_transformer_3d
title: Magi1Transformer3DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/omnigen_transformer
Expand Down Expand Up @@ -430,6 +432,8 @@
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo
- local: api/models/autoencoder_kl_magi1
title: AutoencoderKLMagi1
- local: api/models/autoencoderkl_magvit
title: AutoencoderKLMagvit
- local: api/models/autoencoderkl_mochi
Expand Down
34 changes: 34 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_magi1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLMagi1

The 3D variational autoencoder (VAE) model with KL loss used in [MAGI-1: Autoregressive Video Generation at Scale](https://arxiv.org/abs/2505.13211) by Sand.ai.

MAGI-1 uses a transformer-based VAE with 8x spatial and 4x temporal compression, providing fast average decoding time and highly competitive reconstruction quality.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLMagi1

vae = AutoencoderKLMagi1.from_pretrained("sand-ai/MAGI-1", subfolder="vae", torch_dtype=torch.float32)
```

## AutoencoderKLMagi1

[[autodoc]] AutoencoderKLMagi1
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
32 changes: 32 additions & 0 deletions docs/source/en/api/models/magi1_transformer_3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# Magi1Transformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [MAGI-1: Autoregressive Video Generation at Scale](https://arxiv.org/abs/2505.13211) by Sand.ai.

MAGI-1 is an autoregressive denoising video generation model that generates videos chunk-by-chunk instead of as a whole. Each chunk (24 frames) is denoised holistically, and the generation of the next chunk begins as soon as the current one reaches a certain level of denoising.

The model can be loaded with the following code snippet.

```python
from diffusers import Magi1Transformer3DModel

transformer = Magi1Transformer3DModel.from_pretrained("sand-ai/MAGI-1", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## Magi1Transformer3DModel

[[autodoc]] Magi1Transformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
261 changes: 261 additions & 0 deletions docs/source/en/api/pipelines/magi1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</a>
</div>
</div>

# MAGI-1

[MAGI-1: Autoregressive Video Generation at Scale](https://arxiv.org/abs/2505.13211) by Sand.ai.

*MAGI-1 is an autoregressive video generation model that generates videos chunk-by-chunk instead of as a whole. Each chunk (24 frames) is denoised holistically, and the generation of the next chunk begins as soon as the current one reaches a certain level of denoising. This pipeline design enables concurrent processing of up to four chunks for efficient video generation. The model leverages a specialized architecture with a transformer-based VAE with 8x spatial and 4x temporal compression, and a diffusion transformer with several key innovations including Block-Causal Attention, Parallel Attention Block, QK-Norm and GQA, Sandwich Normalization in FFN, SwiGLU, and Softcap Modulation.*

The original repo: https://github.com/SandAI-org/MAGI-1

This model was contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz).

You can find the MAGI-1 checkpoints under the [sand-ai](https://huggingface.co/sand-ai) organization.

The following MAGI-1 models are supported in Diffusers:

**Base Models:**
- [MAGI-1 24B](https://huggingface.co/sand-ai/MAGI-1)
- [MAGI-1 4.5B](https://huggingface.co/sand-ai/MAGI-1-4.5B)

**Distilled Models (faster inference):**
- [MAGI-1 24B Distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill)
- [MAGI-1 24B Distill+Quant (FP8)](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill_quant)
- [MAGI-1 4.5B Distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/4.5B_distill)
- [MAGI-1 4.5B Distill+Quant (FP8)](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/4.5B_distill_quant)

> [!TIP]
> Click on the MAGI-1 models in the right sidebar for more examples of video generation.

### Text-to-Video Generation

The example below demonstrates how to generate a video from text optimized for memory or inference speed.

<hfoptions id="T2V usage">
<hfoption id="T2V memory">

Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.

The MAGI-1 text-to-video model below requires ~13GB of VRAM.

```py
import torch
import numpy as np
from diffusers import AutoModel, Magi1Pipeline
from diffusers.hooks.group_offloading import apply_group_offloading
from diffusers.utils import export_to_video
from transformers import T5EncoderModel

text_encoder = T5EncoderModel.from_pretrained("sand-ai/MAGI-1", subfolder="text_encoder", torch_dtype=torch.bfloat16)
vae = AutoModel.from_pretrained("sand-ai/MAGI-1", subfolder="vae", torch_dtype=torch.float32)
transformer = AutoModel.from_pretrained("sand-ai/MAGI-1", subfolder="transformer", torch_dtype=torch.bfloat16)

# group-offloading
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
apply_group_offloading(text_encoder,
onload_device=onload_device,
offload_device=offload_device,
offload_type="block_level",
num_blocks_per_group=4
)
transformer.enable_group_offload(
onload_device=onload_device,
offload_device=offload_device,
offload_type="leaf_level",
use_stream=True
)

pipeline = Magi1Pipeline.from_pretrained(
"sand-ai/MAGI-1",
vae=vae,
transformer=transformer,
text_encoder=text_encoder,
torch_dtype=torch.bfloat16
)
pipeline.to("cuda")

prompt = """
A majestic eagle soaring over a mountain landscape. The eagle's wings are spread wide,
catching the golden sunlight as it glides through the clear blue sky. Below, snow-capped
mountains stretch to the horizon, with pine forests and a winding river visible in the valley.
"""
negative_prompt = """
Poor quality, blurry, pixelated, low resolution, distorted proportions, unnatural colors,
watermark, text overlay, incomplete rendering, glitches, artifacts, unrealistic lighting
"""

output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=24,
guidance_scale=7.0,
).frames[0]
export_to_video(output, "output.mp4", fps=8)
```

</hfoption>
<hfoption id="T2V inference speed">

[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster.

```py
import torch
import numpy as np
from diffusers import AutoModel, Magi1Pipeline
from diffusers.utils import export_to_video
from transformers import T5EncoderModel

text_encoder = T5EncoderModel.from_pretrained("sand-ai/MAGI-1", subfolder="text_encoder", torch_dtype=torch.bfloat16)
vae = AutoModel.from_pretrained("sand-ai/MAGI-1", subfolder="vae", torch_dtype=torch.float32)
transformer = AutoModel.from_pretrained("sand-ai/MAGI-1", subfolder="transformer", torch_dtype=torch.bfloat16)

pipeline = Magi1Pipeline.from_pretrained(
"sand-ai/MAGI-1",
vae=vae,
transformer=transformer,
text_encoder=text_encoder,
torch_dtype=torch.bfloat16
)
pipeline.to("cuda")

# torch.compile
pipeline.transformer.to(memory_format=torch.channels_last)
pipeline.transformer = torch.compile(
pipeline.transformer, mode="max-autotune", fullgraph=True
)

prompt = """
A majestic eagle soaring over a mountain landscape. The eagle's wings are spread wide,
catching the golden sunlight as it glides through the clear blue sky. Below, snow-capped
mountains stretch to the horizon, with pine forests and a winding river visible in the valley.
"""
negative_prompt = """
Poor quality, blurry, pixelated, low resolution, distorted proportions, unnatural colors,
watermark, text overlay, incomplete rendering, glitches, artifacts, unrealistic lighting
"""

output = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=24,
guidance_scale=7.0,
).frames[0]
export_to_video(output, "output.mp4", fps=8)
```

</hfoption>
</hfoptions>

### Image-to-Video Generation

The example below demonstrates how to use the image-to-video pipeline to generate a video animation from a single image using text prompts for guidance.

<hfoptions id="I2V usage">
<hfoption id="usage">

```python
import torch
from diffusers import Magi1ImageToVideoPipeline, AutoencoderKLMagi1
from diffusers.utils import export_to_video, load_image

model_id = "sand-ai/MAGI-1-I2V"
vae = AutoencoderKLMagi1.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = Magi1ImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Load input image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg")

prompt = (
"An astronaut walking on the moon's surface, with the Earth visible in the background. "
"The astronaut moves slowly in a low-gravity environment, kicking up lunar dust with each step."
)
negative_prompt = "Bright tones, overexposed, static, blurred details, worst quality, low quality"

output = pipe(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
height=480,
width=832,
num_frames=81, # Generate 81 frames (~5 seconds at 16fps)
guidance_scale=5.0,
num_inference_steps=50,
).frames[0]
export_to_video(output, "astronaut_animation.mp4", fps=16)
```

</hfoption>
</hfoptions>

### Video-to-Video Generation

The example below demonstrates how to use the video-to-video pipeline to extend or continue an existing video using text prompts.

<hfoptions id="V2V usage">
<hfoption id="usage">

```python
import torch
from diffusers import Magi1VideoToVideoPipeline, AutoencoderKLMagi1
from diffusers.utils import export_to_video, load_video

model_id = "sand-ai/MAGI-1-V2V"
vae = AutoencoderKLMagi1.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = Magi1VideoToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Load prefix video (e.g., first 24 frames of a video)
video = load_video("path/to/input_video.mp4", num_frames=24)

prompt = (
"Continue this video with smooth camera motion and consistent style. "
"The scene evolves naturally with coherent motion and lighting."
)
negative_prompt = "Bright tones, overexposed, static, blurred details, worst quality, low quality, jumpy motion"

output = pipe(
video=video,
prompt=prompt,
negative_prompt=negative_prompt,
height=480,
width=832,
num_frames=81, # Total frames including prefix (24 prefix + 57 generated)
guidance_scale=5.0,
num_inference_steps=50,
).frames[0]
export_to_video(output, "video_continuation.mp4", fps=16)
```

</hfoption>
</hfoptions>

## Notes

- MAGI-1 uses autoregressive chunked generation with `chunk_width=6` and `window_size=4`, enabling efficient long video generation.
- The model supports special tokens for quality control (HQ_TOKEN), style (THREE_D_MODEL_TOKEN, TWO_D_ANIME_TOKEN), and motion guidance (STATIC_FIRST_FRAMES_TOKEN, DYNAMIC_FIRST_FRAMES_TOKEN).
- For I2V, the input image is encoded as a clean prefix chunk to condition the video generation.
- For V2V, input video frames (typically 24 frames or ~1.5 seconds) are encoded as clean prefix chunks, and the model generates a continuation.
- MAGI-1 supports LoRAs with [`~loaders.Magi1LoraLoaderMixin.load_lora_weights`].
- Distillation mode can be enabled for faster inference with `enable_distillation=True` (requires distilled model checkpoint).
3 changes: 2 additions & 1 deletion docs/source/en/optimization/attention_backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,5 +149,6 @@ Refer to the table below for a complete list of available attention backends and
| `_sage_qk_int8_pv_fp16_cuda` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (CUDA) |
| `_sage_qk_int8_pv_fp16_triton` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP16 PV (Triton) |
| `xformers` | [xFormers](https://github.com/facebookresearch/xformers) | Memory-efficient attention |
| `magi` | [MagiAttention](https://github.com/SandAI-org/MagiAttention) | A CP-based Attention Towards Linear Scalability, Heterogeneous Mask Training |

</details>
</details>
Loading