Skip to content

paksaitsolutions/PaksaTalker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PaksaTalker: Advanced AI-Powered Talking Head Video Generation

License: MIT Python 3.8+ Code style: black

PaksaTalker is an enterprise-grade AI framework for generating hyper-realistic talking head videos with perfect lip-sync, natural facial expressions, and life-like gestures. Built on cutting-edge AI research, it seamlessly integrates multiple state-of-the-art models to deliver production-ready video synthesis.

PaksaTalker is an advanced AI-powered platform that creates hyper-realistic talking avatars with synchronized facial expressions and natural body gestures. The system combines multiple state-of-the-art AI models including Qwen for language processing, SadTalker for facial animation, and PantoMatrix/EMAGE for full-body gesture generation, delivering production-ready video synthesis with unprecedented realism.

🌟 Key Features

🎭 Natural Animation

  • Precise Lip-Sync: Frame-accurate audio-visual synchronization
  • Expressive Faces: Emotionally aware facial animations
  • Natural Gestures: Context-appropriate head movements and expressions
  • High Fidelity: 4K resolution support with minimal artifacts

πŸ› οΈ Technical Capabilities

  • Multi-model architecture (SadTalker, Wav2Lip, Qwen)
  • GPU-accelerated processing
  • Batch processing support
  • Real-time preview
  • RESTful API for easy integration

🧩 Extensible Architecture

  • Modular design for easy model swapping

  • Plugin system for custom integrations

  • Support for custom voice models

  • Multi-language support

  • Precise Lip-Sync: Frame-accurate audio-visual synchronization using SadTalker

  • Expressive Faces: Emotionally aware facial animations with micro-expressions

  • Full-Body Gestures: Context-appropriate body language and hand movements

  • High Fidelity: 4K resolution support with DSLR-quality rendering

πŸ› οΈ Technical Capabilities

  • Multi-Model Architecture: Integrates Qwen LLM, SadTalker, and PantoMatrix
  • GPU-Accelerated: Optimized for NVIDIA GPUs with CUDA support
  • Modular Design: Swappable components for customization
  • High-Quality Output: 1080p+ resolution with advanced rendering
  • RESTful API: Easy integration with existing systems

🧩 Extensible Architecture

  • Modular Pipeline: Independent components for face, body, and voice
  • Custom Avatars: Support for 3D models and 2D images
  • Plugin System: Extend with custom models and effects
  • Multi-Language: Support for multiple languages and accents
  • Customization: Fine-tune animation styles and rendering parameters

πŸš€ Getting Started

Prerequisites

  • Python 3.8+

  • CUDA 11.3+ (for GPU acceleration)

  • ffmpeg 4.4+

  • 8GB+ VRAM recommended

  • Python 3.9+ with pip

  • Node.js 16+ and npm 8+ (for web interface)

  • CUDA 11.8+ (for GPU acceleration)

  • ffmpeg 4.4+ for video processing

  • NVIDIA GPU with 16GB+ VRAM recommended

  • Docker (optional, for containerized deployment

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/paksatalker.git
    cd paksatalker
  2. Set up Python environment:

    # Create and activate virtual environment
    python -m venv venv
    # On Windows:
    .\venv\Scripts\activate
    # On macOS/Linux:
    # source venv/bin/activate
    
    # Install Python dependencies
    pip install -r requirements.txt
  3. Install AI Models:

    # Download SadTalker models
    python -c "from models.sadtalker import download_models; download_models()"
    
    # Download PantoMatrix/EMAGE models
    python -c "from models.gesture import download_models; download_models()"
    
    # Download Qwen model weights (optional, can use API)
    # python -c "from models.qwen import download_models; download_models()"
  4. Set up frontend (for web interface):

    cd frontend
    npm install
    npm run build
    cd ..
    

πŸ–₯️ Quick Start

Command Line Interface

# Basic usage
python -m PaksaTalker.cli \
    --image input/face.jpg \
    --audio input/speech.wav \
    --output output/result.mp4 \
    --enhance_face True \
    --expression_intensity 0.8

# Advanced options
python -m PaksaTalker.cli \
    --image input/face.jpg \
    --audio input/speech.wav \
    --output output/result.mp4 \
    --resolution 1080 \
    --fps 30 \
    --background blur \
    --gesture_level medium

Python API

from PaksaTalker import PaksaTalker

### Generate a Talking Avatar from Text

```bash
# Generate speech and animate avatar from text
python -m cli.generate \
    --text "Hello, I'm your AI assistant. Welcome to PaksaTalker!" \
    --image assets/avatars/default.jpg \
    --voice en-US-JennyNeural \
    --output output/welcome.mp4 \
    --gesture-style natural \
    --resolution 1080

Animate with Custom Audio

# Animate avatar with existing audio
python -m cli.animate \
    --image assets/avatars/presenter.jpg \
    --audio input/presentation.wav \
    --output output/presentation.mp4 \
    --expression excited \
    --background blur \
    --lighting studio

Advanced Options

# Full pipeline with custom settings
python -m cli.pipeline \
    --prompt "Explain quantum computing in simple terms" \
    --avatar assets/avatars/scientist.jpg \
    --voice en-US-ChristopherNeural \
    --style professional \
    --gesture-level high \
    --output output/quantum_explainer.mp4 \
    --resolution 4k \
    --fps 30 \
    --enhance-face \
    --background office

🐍 Python API

Basic Usage

from paksatalker import Pipeline

# Initialize the pipeline
pipeline = Pipeline(
    model_dir="models",
    device="cuda"  # or "cpu" if no GPU
)

# Generate a talking avatar video
result = pipeline.generate(
    text="Welcome to PaksaTalker, the future of digital avatars.",
    image_path="assets/avatars/host.jpg",
    voice="en-US-JennyNeural",
    output_path="output/welcome.mp4",
    gesture_style="casual",
    resolution=1080
)

print(f"Video generated at: {result['output_path']}")

Advanced Usage

from paksatalker import (
    TextToSpeech,
    FaceAnimator,
    GestureGenerator,
    VideoRenderer
)

# Initialize components
tts = TextToSpeech(voice="en-US-ChristopherNeural")
animator = FaceAnimator(model_path="models/sadtalker")
gesture = GestureGenerator(model_path="models/pantomatrix")
renderer = VideoRenderer(resolution=1080, fps=30)

# Process pipeline
text = "Let me show you how this works..."
audio = tts.generate(text)
face_animation = animator.animate("assets/avatars/assistant.jpg", audio)
body_animation = gesture.generate(audio, style="presentation")

# Render final video
video = renderer.combine(
    face_animation=face_animation,
    body_animation=body_animation,
    audio=audio,
    output_path="output/demo.mp4"
)

from pathlib import Path

Initialize with custom settings

pt = PaksaTalker( device="cuda", # or "cpu" model_dir="models/", temp_dir="temp/" )

Generate video with enhanced settings

result = pt.generate( image_path="input/face.jpg", audio_path="input/speech.wav", output_path="output/result.mp4", config={ "resolution": 1080, "fps": 30, "expression_scale": 0.9, "head_pose": "natural", "background": { "type": "blur", "blur_strength": 0.7 }, "post_processing": { "denoise": True, "color_correction": True, "stabilization": True } } )


## πŸ—οΈ Architecture

PaksaTalker/ β”œβ”€β”€ api/ # REST API endpoints β”‚ β”œβ”€β”€ routes/ # API route definitions β”‚ β”œβ”€β”€ schemas/ # Pydantic models β”‚ └── utils/ # API utilities β”‚ β”œβ”€β”€ config/ # Configuration management β”‚ β”œβ”€β”€ init.py β”‚ └── config.py β”‚ β”œβ”€β”€ core/ # Core functionality β”‚ β”œβ”€β”€ engine.py # Main processing pipeline β”‚ β”œβ”€β”€ video.py # Video processing β”‚ └── audio.py # Audio processing β”‚ β”œβ”€β”€ integrations/ # Model integrations β”‚ β”œβ”€β”€ sadtalker/ # SadTalker implementation β”‚ β”œβ”€β”€ wav2lip/ # Wav2Lip integration β”‚ β”œβ”€β”€ qwen/ # Qwen language model β”‚ └── gesture/ # Gesture generation β”‚ β”œβ”€β”€ models/ # Model architectures β”‚ β”œβ”€β”€ base.py # Base model interface β”‚ └── registry.py # Model registry β”‚ β”œβ”€β”€ static/ # Static files β”‚ β”œβ”€β”€ css/ β”‚ β”œβ”€β”€ js/ β”‚ └── templates/ β”‚ β”œβ”€β”€ tests/ # Test suite β”‚ β”œβ”€β”€ unit/ β”‚ └── integration/ β”‚ β”œβ”€β”€ utils/ # Utility functions β”‚ β”œβ”€β”€ audio_utils.py β”‚ β”œβ”€β”€ video_utils.py β”‚ └── face_utils.py β”‚ β”œβ”€β”€ app.py # Main application β”œβ”€β”€ cli.py # Command-line interface └── requirements.txt # Dependencies

πŸƒβ€β™‚οΈ Usage

Development Mode

  1. Start the development servers:
    # In the project root directory
    python run_dev.py
    This will start:

Production Build

  1. Build the frontend:

    cd frontend
    npm run build
    cd ..
  2. Start the production server:

    uvicorn app:app --host 0.0.0.0 --port 8000

    The application will be available at http://localhost:8000

Command Line (Direct API)

python app.py --input "Hello world" --output output/video.mp4

πŸ”§ Configuration

PaksaTalker is highly configurable. Here's an example configuration:

# config/config.yaml
models:
  sadtalker:
    checkpoint: "models/sadtalker/checkpoints"
    config: "models/sadtalker/configs"

  wav2lip:
    checkpoint: "models/wav2lip/checkpoints"

  qwen:
    model_name: "Qwen/Qwen-7B-Chat"

processing:
  resolution: 1080
  fps: 30
  batch_size: 4
  device: "cuda"

api:
  host: "0.0.0.0"
  port: 8000
  workers: 4
  debug: false

πŸ§ͺ Testing

Run the test suite:

# Install test dependencies
pip install -r requirements-test.txt

# Run tests
pytest tests/

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

Distributed under the MIT License. See LICENSE for more information.

πŸ“š Documentation

For detailed documentation, please visit our Documentation.

Project Structure

paksatalker/
β”œβ”€β”€ frontend/           # React + TypeScript frontend
β”‚   β”œβ”€β”€ src/            # Source files
β”‚   β”œβ”€β”€ public/         # Static files
β”‚   └── package.json    # Frontend dependencies
β”œβ”€β”€ api/                # API endpoints
β”œβ”€β”€ config/             # Configuration files
β”œβ”€β”€ models/             # AI models
β”œβ”€β”€ static/             # Static files (served by FastAPI)
β”œβ”€β”€ app.py              # Main application entry point
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md           # This file

Environment Variables

Create a .env file in the project root with the following variables:

# Backend
DEBUG=True
PORT=8000

# Database
DATABASE_URL=sqlite:///./paksatalker.db

# JWT
SECRET_KEY=your-secret-key
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30

For development, you can also create a .env.development file in the frontend directory.

API Documentation

Once the server is running, visit /api/docs for interactive API documentation (Swagger UI).

For detailed documentation, please visit our documentation website.

πŸ“§ Contact

Project Link: https://github.com/yourusername/paksatalker

πŸ™ Acknowledgments

  • SadTalker - For the amazing talking head generation
  • Wav2Lip - For lip-sync technology
  • Qwen - For advanced language modeling
  • All contributors and open-source maintainers who made this project possible

Quick Start (Stable Server)

Run the bundled stable server with fallbacks and background asset prefetch:

python stable_server.py
# API: http://localhost:8000
# Swagger UI: http://localhost:8000/api/docs

Optionally force asset ensure:

curl -X POST http://localhost:8000/api/v1/assets/ensure

Fusion Background & Green‑Screen

POST /api/v1/generate/fusion-video supports optional background parameters:

  • backgroundMode: none|blur|portrait|cinematic|color|image|greenscreen
  • backgroundColor: hex (for color/greenscreen)
  • backgroundImage: file (for image/greenscreen)
  • chromaColor, similarity, blend: green‑screen chroma key tuning

AI Style Suggestions (MVP)

Suggest presets matching your context:

curl -X POST http://localhost:8000/api/v1/style-presets/suggest \
  -F prompt="energetic keynote" -F cultural_context=GLOBAL -F formality=0.7

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •