Open-source video generation pipeline in ComfyUI that creates coherent multi-shot video narratives with synchronized dialogue β inspired by OpenAI's Sora 2.
Generate complete video stories from minimal input: a character description, a reference photo, and the number of scenes. The workflow orchestrates multiple AI models through a 4-LLM pipeline to handle dialogue writing, cinematography, voice casting, and animation automatically.
This repository contains three powerful workflows:
File: workflow-full-loop-Google.json
Simplified pipeline for generating longer videos using Google's latest models in ComfyUI. Same core approach but leverages proprietary APIs for enhanced quality and length.
File: workflow-full-loop-Sora2-ComfyUI.json
Character-driven multi-shot video generation with automated dialogue and perfect lip-sync. 100% open-source models.
File: workflow-full-loop-Storycreator.json
Text-to-story generation with custom styles, music synchronization, and complete narrative control. Open-source based.
- 4-LLM Pre-Production Pipeline: Scriptwriter β Cinematographer β Animation Director β Voice Casting
- Scene Consistency: Last-frame continuation ensures visual coherence between shots
- AI Cinematography: Custom-trained
Next SceneLoRA interprets "Next shot..." prompts to change camera angles intelligently - Perfect Lip-Sync: Dynamic audio-video synchronization using custom
Audio Durationnode - Automated Face Swapping: Maintains character identity across all generated scenes
- One-Click Generation: Complete multi-shot videos from simple inputs
- GPU: Minimum RTX 6000 Pro (96GB VRAM) to run the full workflow
- Alternative: Use RunPod or similar cloud GPU services
- Plenty of online guides available for RunPod ComfyUI setup
-
Clone or download this repository:
git clone https://github.com/lovis93/ComfyUI-Workflow-Sora2Alike-Full-loop-video.git
-
Install ComfyUI-Lovis-Node (required custom nodes):
# Copy to your ComfyUI custom_nodes directory cp -r ComfyUI-Lovis-Node /path/to/ComfyUI/custom_nodes/ # Install dependencies cd /path/to/ComfyUI/custom_nodes/ComfyUI-Lovis-Node pip install -r requirements.txt
-
Install other required custom nodes via ComfyUI Manager:
- ComfyUI-WanVideoWrapper
- ComfyUI-Custom-Scripts (pythongosssss)
- ComfyUI LayerStyle
- ComfyUI Easy Use
- ComfyUI essentials (mb fork)
- Audio Batch
- ComfyUI Workflow Encrypt
- ComfyUI JM MiniMax API
- Infinite Talk nodes
-
Load the workflow:
- Open ComfyUI
- Load
workflow-full-loop-Sora2-ComfyUI.json(main character-based workflow)
Base Models (place in ComfyUI/models/):
qwen_image_distill_full_fp8_e4m3fn.safetensorsβ Qwen-Image (text-to-image)qwen_image_edit_2509_fp8_e4m3fn.safetensorsβ Qwen-Image-Edit (scene transitions)wan2.2_i2v_high_noise_14B_fp16.safetensorsβ Wan 2.2 I2V (video generation)wan2.2_i2v_low_noise_14B_fp16.safetensorsβ Wan 2.2 I2V low noiseWan2_1-InfiniTetalk-Single_fp16.safetensorsβ Infinite Talk (talking heads)wan_2.1_vae.safetensorsβ Wan VAEqwen_image_vae.safetensorsβ Qwen VAEqwen_2.5_vl_7b_fp8_scaled.safetensorsβ Text encoderumt5_xxl_fp16.safetensorsβ UMT5clip_vision_h.safetensorsβ CLIP Vision
LoRAs (place in ComfyUI/models/loras/):
-
next-scene_lora_v1-3000.safetensorsβ Next Scene LoRA (AI Cinematographer)- Download: lovis93/next-scene-qwen-image-lora-2509
-
motionpushin-v5-wan-i2v-14b-720p-400.safetensorsβ Motion Push-In LoRA (Camera motion) -
Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensorsβ Wan distill LoRA -
Qwen-Image-Lightning-4steps-V1.0.safetensorsβ Qwen Lightning LoRA -
Qwen_influencer_style_v1.safetensorsβ Style LoRA
Inputs:
- Main Prompt: Character description (e.g., "Audrey loves ComfyUI, she is a geek")
- Reference Photo: Clear photo of your character
- Number of Lines: How many dialogue lines/scenes to generate (e.g., 4)
- Image Size: 768x512 (faster) or higher for quality
- FPS: 24 or 30
Click Queue Prompt and wait. The workflow will automatically:
- Generate dialogue and cinematography via 4 LLMs
- Create initial scene with face swap
- Loop through remaining scenes with consistent transitions
- Synthesize audio with perfect duration matching
- Animate each scene with lip-sync
- Combine all segments into final video
Models:
Qwen-Imageβ Initial image generation for scene 1Qwen-Image-Editβ Scene transitions via image editing (scenes 2+)Wan 2.2 I2V+Infinite Talkβ Video animation with talking heads- Custom LoRAs:
Next Scene(cinematography) +Motion Pushin(camera motion)
Custom Nodes (included in ComfyUI-Lovis-Node):
- Line Count Node β Counts dialogue lines
- Text to Single Line Node β Text formatting
- Audio Duration Node β Calculates exact audio duration for perfect sync
Before video generation, 4 specialized LLMs process your input:
- LLM 1 - Scriptwriter: Generates N dialogue lines from character prompt
- LLM 2 - Cinematographer: Creates N image prompts (scenes 2+ prefixed with "Next shot...")
- LLM 3 - Animation Director: Creates N video animation prompts
- LLM 4 - Voice Casting: Generates detailed voice profile (via MiniMax)
Phase 1 β Scene 1 Initialization:
- Generate initial frame with
Qwen-Image - Automated face swap (reference photo β generated image)
- Synthesize audio for dialogue line 1
- Measure exact audio duration (
Audio Durationnode) - Animate with
Wan 2.2 I2V+Infinite Talk(duration = audio length)
Phase 2 β Iterative Loop (Scenes 2-N):
- Take last frame of previous video clip
- Edit frame with
Qwen-Image-Edit+Next SceneLoRA + "Next shot..." prompt - Result: new camera angle while preserving character/environment
- Automated face swap on new frame
- Generate audio β measure duration β animate
- Repeat until all lines complete
Phase 3 β Final Assembly:
- Concatenate all video segments + audio clips
- Output single continuous video
- Last-frame continuation: Each scene starts from the previous scene's final frame
- "Next shot..." trigger: Activates
Next SceneLoRA for intelligent camera angle changes - Dynamic synchronization: Video length automatically matches audio duration (no manual timing)
Line Count Node (text/utility):
- Counts lines in text input
- Determines number of scenes to generate
Text to Single Line Node (text/utility):
- Converts multi-line text to single line with customizable separators
Audio Duration Node (audio/utility):
- Extracts duration from audio data in seconds
- Calculates frame count based on FPS
- Critical for perfect lip-sync synchronization
See ComfyUI-Lovis-Node/README.md for detailed documentation.
Alternative workflow for creating narrative sequences with more creative control.
Features:
- Text-to-Story: Automatically generates visual sequences from any story text
- Custom Styles: Apply different visual styles and looks to your narrative
- Music Synchronization: Generates music automatically timed to the story
- Same Core Tools: Built on the same model foundation as the main workflow
Use Case: Ideal for creating complete stories, film sequences, or narrative content with custom aesthetics and synchronized soundtracks.
Models & Technologies:
- Qwen-Image / Qwen-Image-Edit β Alibaba Cloud
- Wan 2.2 I2V β Wan AI
- Infinite Talk β Infinite Talk Project
- MiniMax Voice Synthesis β MiniMax AI
Custom LoRAs:
- Next Scene LoRA β Trained by @lovis93 on 100+ cinematic transitions
- Motion Push-In LoRA β Trained by @lovis93 on 100+ drone camera clips
Inspiration:
- OpenAI Sora 2 β For demonstrating AI video generation potential
MIT License β Free to use, modify, and share.
Custom LoRAs: Apache 2.0 License (see model cards on HuggingFace).
- GitHub: lovis93
- X: lovis93
- HuggingFace: @lovis93
- Issues: Report bugs or request features
Built with β€οΈ for the open-source AI community
Making cinematic AI video generation accessible to everyone.





