Production-ready • Modular • Clean • Fully Scalable ComfyUI Workflow
V3 represents the latest evolution of my complete ComfyUI workflow system expanding beyond visual generation into a fully integrated Image + Audio production environment.
Engineered for performance, modularity, and scalability, V3 delivers a production-ready creative pipeline covering advanced image synthesis, speech generation, and AI-driven music workflows.
Designed around clarity and total creative control, this system provides a clean yet ultra-complete environment, allowing streamlined generation while exposing every critical parameter for professionals who require deep technical flexibility.
Creations 1–4 • Generated with ComfyUI • By @CELECYA
A structured overview of the working environments included in V3 :
| TXT2IMG | IMG2IMG |
|---|---|
![]() |
![]() |
| IMG2IN & OUT-PAINTING | IMG2UPSCALE |
|---|---|
![]() |
![]() |
| VIBEVOICE SPEACH | KOKORO SPEACH |
|---|---|
![]() |
![]() |
| ACE SEP MUSIC | CHECKPOINT MODEL MUSIC |
|---|---|
![]() |
![]() |
| Feature | Description |
|---|---|
| ✏️ Text → Image Generation | High-fidelity image synthesis from prompts with full exposure of sampling, conditioning, and guidance parameters. |
| 🎨 Image → Image Transformation | Controlled reinterpretation, stylization, and refinement of existing visuals with precise denoise management. |
| 🖌 Inpainting & Outpainting | Seamless localized edits or canvas expansion while maintaining global visual harmony. |
| 📈 Professional Upscaling | Resolution enhancement with detail preservation and structural consistency — suitable for high-resolution output and print workflows. |
| 🎯 Advanced ControlNet Integration | Fine-grained structural control over pose, depth, composition, and spatial coherence for predictable results. |
| 🎧 Speech Generation (VibeVoice & Kokoro) | Modular text-to-speech pipelines with tokenizer integration, ONNX acceleration, and structured audio routing. |
| 🎼 Music Generation (ACE-Step & Checkpoint Models) | AI-driven music synthesis workflows supporting both structured model pipelines and checkpoint-based generation. |
| 🧩 Subgraph-Based Architecture (Latest ComfyUI) | Built using the new Subgraph system, ensuring modularity and preventing node clutter while keeping full parameter exposure for advanced users. |
V3 is architecture-agnostic and supports both visual and audio ecosystems.
| Model Type | ✅ Support Level |
|---|---|
| ⚡ GGUF (FLUX & QWEN/QWENT) | Fully Supported |
| 🥷 NUNCHAKU | Fully Supported |
| 🖌 SDXL | Fully Supported |
| 🐎 PONY | Fully Supported |
| 🌌 ILLUSTRIOUS | Fully Supported |
| 🎯 Stable Diffusion (SD 1.5 / 2.x) | Fully Supported |
| Model Type | ✅ Support Level |
|---|---|
| 🗣 VibeVoice | Fully Supported |
| 🧠 Kokoro-ONNX | Fully Supported |
| 🎼 ACE-Step Music | Fully Supported |
| 🎵 Checkpoint-Based Music Models | Fully Supported |
| Step | Action |
|---|---|
| 1️⃣ | Download the latest Stable release here: Download Workflow |
| 2️⃣ | Place the downloaded file inside your ComfyUI /workflows folder and unzip it. |
| 3️⃣ | Launch ComfyUI and load the workflow. |
| 4️⃣ | Select your model and start generating. |
-
ControlNet Union – FLUX GGUF / NUNCHAKU
https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0/tree/main -
ControlNet Union – SDXL / PONY / ILLUSTRIOUS
https://huggingface.co/xinsir/controlnet-union-sdxl-1.0/tree/main -
ControlNet Union – SD (1.5 / 2.x)
Not available yet -
ControlNet Union – QWENT
Not available yet
V3 is built around the same guiding principles as V2, now extended to multimodal workflows:
- Clarity over clutter – a clean interface for both image and audio pipelines
- Modularity over chaos – structured Subgraphs keep everything organized
- Full control without sacrificing usability – tweak every parameter in image, speech, or music generation
Whether you're producing quick visual concepts, recording speech, or composing AI-generated music,
V3 provides a stable, scalable, and professional foundation for multimodal creation.
V3 – Clean. Modular. Powerful. Multimodal.







