Skip to content

lovisdotio/ComfyUI-Workflow-Sora2Alike-Full-loop-video

Repository files navigation

ComfyUI Workflow: Sora2-Alike Full Loop Video

Open-source video generation pipeline in ComfyUI that creates coherent multi-shot video narratives with synchronized dialogue β€” inspired by OpenAI's Sora 2.

Generate complete video stories from minimal input: a character description, a reference photo, and the number of scenes. The workflow orchestrates multiple AI models through a 4-LLM pipeline to handle dialogue writing, cinematography, voice casting, and animation automatically.


🎬 Workflows Included

This repository contains three powerful workflows:

1. Google Veo 3 Pipeline ⚑ (Long-Form Video Generation)

File: workflow-full-loop-Google.json

Workflow Preview

Demo

⚠️ Note: This workflow uses paid APIs (Google Veo 3, Imagen 3, etc.). Costs can be very high for video generation. Not open-source models.

Simplified pipeline for generating longer videos using Google's latest models in ComfyUI. Same core approach but leverages proprietary APIs for enhanced quality and length.

2. Sora2-ComfyUI (Open-Source Character Videos)

File: workflow-full-loop-Sora2-ComfyUI.json

Workflow Preview

Demo

Character-driven multi-shot video generation with automated dialogue and perfect lip-sync. 100% open-source models.

3. Story Creator (Narrative Sequences with Music)

File: workflow-full-loop-Storycreator.json

Workflow Preview

Demo

Text-to-story generation with custom styles, music synchronization, and complete narrative control. Open-source based.


✨ Key Features

  • 4-LLM Pre-Production Pipeline: Scriptwriter β†’ Cinematographer β†’ Animation Director β†’ Voice Casting
  • Scene Consistency: Last-frame continuation ensures visual coherence between shots
  • AI Cinematography: Custom-trained Next Scene LoRA interprets "Next shot..." prompts to change camera angles intelligently
  • Perfect Lip-Sync: Dynamic audio-video synchronization using custom Audio Duration node
  • Automated Face Swapping: Maintains character identity across all generated scenes
  • One-Click Generation: Complete multi-shot videos from simple inputs

πŸ› οΈ Installation

Recommended Configuration

  • GPU: Minimum RTX 6000 Pro (96GB VRAM) to run the full workflow
  • Alternative: Use RunPod or similar cloud GPU services
  • Plenty of online guides available for RunPod ComfyUI setup

Download Workflow & Custom Nodes

  1. Clone or download this repository:

    git clone https://github.com/lovis93/ComfyUI-Workflow-Sora2Alike-Full-loop-video.git
  2. Install ComfyUI-Lovis-Node (required custom nodes):

    # Copy to your ComfyUI custom_nodes directory
    cp -r ComfyUI-Lovis-Node /path/to/ComfyUI/custom_nodes/
    
    # Install dependencies
    cd /path/to/ComfyUI/custom_nodes/ComfyUI-Lovis-Node
    pip install -r requirements.txt
  3. Install other required custom nodes via ComfyUI Manager:

    • ComfyUI-WanVideoWrapper
    • ComfyUI-Custom-Scripts (pythongosssss)
    • ComfyUI LayerStyle
    • ComfyUI Easy Use
    • ComfyUI essentials (mb fork)
    • Audio Batch
    • ComfyUI Workflow Encrypt
    • ComfyUI JM MiniMax API
    • Infinite Talk nodes
  4. Load the workflow:

    • Open ComfyUI
    • Load workflow-full-loop-Sora2-ComfyUI.json (main character-based workflow)

Download Models

Base Models (place in ComfyUI/models/):

  • qwen_image_distill_full_fp8_e4m3fn.safetensors β€” Qwen-Image (text-to-image)
  • qwen_image_edit_2509_fp8_e4m3fn.safetensors β€” Qwen-Image-Edit (scene transitions)
  • wan2.2_i2v_high_noise_14B_fp16.safetensors β€” Wan 2.2 I2V (video generation)
  • wan2.2_i2v_low_noise_14B_fp16.safetensors β€” Wan 2.2 I2V low noise
  • Wan2_1-InfiniTetalk-Single_fp16.safetensors β€” Infinite Talk (talking heads)
  • wan_2.1_vae.safetensors β€” Wan VAE
  • qwen_image_vae.safetensors β€” Qwen VAE
  • qwen_2.5_vl_7b_fp8_scaled.safetensors β€” Text encoder
  • umt5_xxl_fp16.safetensors β€” UMT5
  • clip_vision_h.safetensors β€” CLIP Vision

LoRAs (place in ComfyUI/models/loras/):

  • next-scene_lora_v1-3000.safetensors β€” Next Scene LoRA (AI Cinematographer)

  • motionpushin-v5-wan-i2v-14b-720p-400.safetensors β€” Motion Push-In LoRA (Camera motion)

  • Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors β€” Wan distill LoRA

  • Qwen-Image-Lightning-4steps-V1.0.safetensors β€” Qwen Lightning LoRA

  • Qwen_influencer_style_v1.safetensors β€” Style LoRA


πŸš€ Usage

Workflow 1: Sora2-ComfyUI (Main Character Workflow)

Inputs:

  1. Main Prompt: Character description (e.g., "Audrey loves ComfyUI, she is a geek")
  2. Reference Photo: Clear photo of your character
  3. Number of Lines: How many dialogue lines/scenes to generate (e.g., 4)

Configuration

  • Image Size: 768x512 (faster) or higher for quality
  • FPS: 24 or 30

Run

Click Queue Prompt and wait. The workflow will automatically:

  1. Generate dialogue and cinematography via 4 LLMs
  2. Create initial scene with face swap
  3. Loop through remaining scenes with consistent transitions
  4. Synthesize audio with perfect duration matching
  5. Animate each scene with lip-sync
  6. Combine all segments into final video

🎬 How It Works

Core Architecture

Models:

  • Qwen-Image β€” Initial image generation for scene 1
  • Qwen-Image-Edit β€” Scene transitions via image editing (scenes 2+)
  • Wan 2.2 I2V + Infinite Talk β€” Video animation with talking heads
  • Custom LoRAs: Next Scene (cinematography) + Motion Pushin (camera motion)

Custom Nodes (included in ComfyUI-Lovis-Node):

  • Line Count Node β€” Counts dialogue lines
  • Text to Single Line Node β€” Text formatting
  • Audio Duration Node β€” Calculates exact audio duration for perfect sync

4-LLM Pre-Processing Pipeline

Before video generation, 4 specialized LLMs process your input:

  1. LLM 1 - Scriptwriter: Generates N dialogue lines from character prompt
  2. LLM 2 - Cinematographer: Creates N image prompts (scenes 2+ prefixed with "Next shot...")
  3. LLM 3 - Animation Director: Creates N video animation prompts
  4. LLM 4 - Voice Casting: Generates detailed voice profile (via MiniMax)

Execution Pipeline

Phase 1 β€” Scene 1 Initialization:

  1. Generate initial frame with Qwen-Image
  2. Automated face swap (reference photo β†’ generated image)
  3. Synthesize audio for dialogue line 1
  4. Measure exact audio duration (Audio Duration node)
  5. Animate with Wan 2.2 I2V + Infinite Talk (duration = audio length)

Phase 2 β€” Iterative Loop (Scenes 2-N):

  1. Take last frame of previous video clip
  2. Edit frame with Qwen-Image-Edit + Next Scene LoRA + "Next shot..." prompt
  3. Result: new camera angle while preserving character/environment
  4. Automated face swap on new frame
  5. Generate audio β†’ measure duration β†’ animate
  6. Repeat until all lines complete

Phase 3 β€” Final Assembly:

  • Concatenate all video segments + audio clips
  • Output single continuous video

Key Innovation

  • Last-frame continuation: Each scene starts from the previous scene's final frame
  • "Next shot..." trigger: Activates Next Scene LoRA for intelligent camera angle changes
  • Dynamic synchronization: Video length automatically matches audio duration (no manual timing)

πŸ“‹ Included Custom Nodes

ComfyUI-Lovis-Node

Line Count Node (text/utility):

  • Counts lines in text input
  • Determines number of scenes to generate

Text to Single Line Node (text/utility):

  • Converts multi-line text to single line with customizable separators

Audio Duration Node (audio/utility):

  • Extracts duration from audio data in seconds
  • Calculates frame count based on FPS
  • Critical for perfect lip-sync synchronization

See ComfyUI-Lovis-Node/README.md for detailed documentation.


Workflow 2: Story Creator

Alternative workflow for creating narrative sequences with more creative control.

Features:

  • Text-to-Story: Automatically generates visual sequences from any story text
  • Custom Styles: Apply different visual styles and looks to your narrative
  • Music Synchronization: Generates music automatically timed to the story
  • Same Core Tools: Built on the same model foundation as the main workflow

Use Case: Ideal for creating complete stories, film sequences, or narrative content with custom aesthetics and synchronized soundtracks.


πŸ™ Credits

Models & Technologies:

  • Qwen-Image / Qwen-Image-Edit β€” Alibaba Cloud
  • Wan 2.2 I2V β€” Wan AI
  • Infinite Talk β€” Infinite Talk Project
  • MiniMax Voice Synthesis β€” MiniMax AI

Custom LoRAs:

  • Next Scene LoRA β€” Trained by @lovis93 on 100+ cinematic transitions
  • Motion Push-In LoRA β€” Trained by @lovis93 on 100+ drone camera clips

Inspiration:

  • OpenAI Sora 2 β€” For demonstrating AI video generation potential

πŸ“œ License

MIT License β€” Free to use, modify, and share.

Custom LoRAs: Apache 2.0 License (see model cards on HuggingFace).


πŸ“ž Contact


Built with ❀️ for the open-source AI community

Making cinematic AI video generation accessible to everyone.

About

ComfyUI workflow Sora 2 alike

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages