Skip to content

Latest commit

 

History

History
123 lines (90 loc) · 3.66 KB

File metadata and controls

123 lines (90 loc) · 3.66 KB

MAGI Inference Example

MAGI multimodal video generation model inference example.

GitHub Repository: MAGI-1

Prerequisites

Install Dependencies

# Install ffmpeg
conda install -c conda-forge ffmpeg=4.4

# For GPUs based on the Hopper architecture (e.g., H100/H800), it is recommended to install MagiAttention for acceleration
# For non-Hopper GPUs, installing MagiAttention is not necessary
mkdir -p 3rd_party/
git clone https://github.com/SandAI-org/MagiAttention.git 3rd_party/MagiAttention
cd 3rd_party/MagiAttention
git submodule update --init --recursive
pip install --no-build-isolation .
cd -

Download Model Weights

huggingface-cli download sand-ai/MAGI-1 --local-dir weights/MAGI-1 --include "ckpt/*"

Inference

MAGI-1 24B Distilled Quantized Version (4 GPUs)

To run the MAGI-1 24B distilled quantized model:

example/magi/magi_24B.sh

MAGI-1 4.5B Base Version (Single GPU)

To run the MAGI-1 4.5B base model:

example/magi/magi_4.5B.sh

Configuration

Model Configuration Files

The project provides configuration files for different model scales and types:

  • 24B_base_config.json: 24B base model configuration
  • 24B_distill_config.json: 24B distilled model configuration
  • 24B_distill_quant_config.json: 24B distilled quantized model configuration
  • 4.5B_base_config.json: 4.5B base model configuration
  • 4.5B_distill_config.json: 4.5B distilled model configuration
  • 4.5B_distill_quant_config.json: 4.5B distilled quantized model configuration

Please modify the configuration files and scripts according to your hardware setup and model paths before use.

Key configuration parameters:

  • model_config: Model architecture parameters (hidden_size, num_layers, etc.)
  • runtime_config: Runtime parameters (cfg_number, num_frames, video_size, etc.)
  • engine_config: Engine parameters (distributed settings, kv_offload, etc.)

Supported Generation Modes

  1. Text-to-Video (T2V): Generate videos from text descriptions
  2. Image-to-Video (I2V): Generate videos from input images
  3. Video-to-Video (V2V): Generate new videos from input videos

Running Different Modes

# Text-to-Video
python example/magi/run_magi.py \
    --config_file example/magi/configs/4.5B/4.5B_base_config.json \
    --mode t2v \
    --prompt "A beautiful sunset over the ocean" \
    --output_path ./output_t2v.mp4

# Image-to-Video
python example/magi/run_magi.py \
    --config_file example/magi/configs/4.5B/4.5B_base_config.json \
    --mode i2v \
    --prompt "Make the cat dance" \
    --image_path ./input_image.jpg \
    --output_path ./output_i2v.mp4

# Video-to-Video
python example/magi/run_magi.py \
    --config_file example/magi/configs/4.5B/4.5B_base_config.json \
    --mode v2v \
    --prompt "Change the weather to snow" \
    --prefix_video_path ./input_video.mp4 \
    --output_path ./output_v2v.mp4

Performance Profiling

Inferix includes built-in performance profiling capabilities for MAGI. To enable profiling, modify the run script to pass a profiling configuration to the pipeline:

from inferix.profiling.config import ProfilingConfig

# Create profiling configuration
profiling_config = ProfilingConfig(
    enabled=True,
    output_dir="./profiling_reports",
    real_time_display=True
)

# Pass to pipeline initialization (if supported by the specific pipeline implementation)
pipeline = MagiPipeline(
    config_path=args.config_file,
    profiling_config=profiling_config  # Enable profiling
)

Profiling reports will be generated in the specified output directory, providing detailed performance metrics for different stages of the pipeline.