Skip to content

PyTorch implementation of Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

License

Apache-2.0, GPL-3.0 licenses found

Licenses found

Apache-2.0
LICENCE
GPL-3.0
LICENSE
Notifications You must be signed in to change notification settings

HelloZicky/PPAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion


PyTorch implementation of Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

📚 Quick Start

Run the following script to start inference:

bash run.sh

🧾 Arguments

Argument Description
--save_dir Directory to save the generated results
--device Device to use for inference (cuda, cpu, etc.)
--num_images_per_prompt Number of images to generate per prompt
--num_inference_steps Total denoising steps; higher means better quality but slower generation
--model_path Path to the diffusion model (e.g., HunyuanDiT)
--vl_model_path Path to the vision-language model for evaluation (e.g., Qwen2.5-VL)
--seed Random seed for reproducibility
--process_steps Number of self-reflection optimization steps
--process_steps_interval Interval between each self-reflection step during generation
--process_start Fraction of inference steps after which to start self-reflection (e.g., 0.1)
--process_end Fraction of inference steps to stop self-reflection (e.g., 0.9)
--use_self_reflection Whether to enable self-reflection during image generation

📁 Output Structure

Generated outputs are saved under the specified --save_dir directory. Example structure:

results/
├── image_0.png
├── image_1.png
├── ...
└── prompts.txt

📦 Requirements

Make sure the following libraries are installed:

  • diffusers
  • transformers
  • torch

Or install from requirements.txt

About

PyTorch implementation of Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

Resources

License

Apache-2.0, GPL-3.0 licenses found

Licenses found

Apache-2.0
LICENCE
GPL-3.0
LICENSE

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published