🚀 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

PyTorch implementation of Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

📚 Quick Start

Run the following script to start inference:

bash run.sh

🧾 Arguments

Argument	Description
`--save_dir`	Directory to save the generated results
`--device`	Device to use for inference (`cuda`, `cpu`, etc.)
`--num_images_per_prompt`	Number of images to generate per prompt
`--num_inference_steps`	Total denoising steps; higher means better quality but slower generation
`--model_path`	Path to the diffusion model (e.g., HunyuanDiT)
`--vl_model_path`	Path to the vision-language model for evaluation (e.g., Qwen2.5-VL)
`--seed`	Random seed for reproducibility
`--process_steps`	Number of self-reflection optimization steps
`--process_steps_interval`	Interval between each self-reflection step during generation
`--process_start`	Fraction of inference steps after which to start self-reflection (e.g., 0.1)
`--process_end`	Fraction of inference steps to stop self-reflection (e.g., 0.9)
`--use_self_reflection`	Whether to enable self-reflection during image generation

📁 Output Structure

Generated outputs are saved under the specified --save_dir directory. Example structure:

results/
├── image_0.png
├── image_1.png
├── ...
└── prompts.txt

📦 Requirements

Make sure the following libraries are installed:

diffusers
transformers
torch

Or install from requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
model		model
scripts		scripts
vlm		vlm
.gitignore		.gitignore
LICENCE		LICENCE
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

🚀 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

📚 Quick Start

🧾 Arguments

📁 Output Structure

📦 Requirements

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

HelloZicky/PPAD

Folders and files

Latest commit

History

Repository files navigation

🚀 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion

📚 Quick Start

🧾 Arguments

📁 Output Structure

📦 Requirements

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages