PaksaTalker: Advanced AI-Powered Talking Head Video Generation

PaksaTalker is an enterprise-grade AI framework for generating hyper-realistic talking head videos with perfect lip-sync, natural facial expressions, and life-like gestures. Built on cutting-edge AI research, it seamlessly integrates multiple state-of-the-art models to deliver production-ready video synthesis.

PaksaTalker is an advanced AI-powered platform that creates hyper-realistic talking avatars with synchronized facial expressions and natural body gestures. The system combines multiple state-of-the-art AI models including Qwen for language processing, SadTalker for facial animation, and PantoMatrix/EMAGE for full-body gesture generation, delivering production-ready video synthesis with unprecedented realism.

🌟 Key Features

🎭 Natural Animation

Precise Lip-Sync: Frame-accurate audio-visual synchronization
Expressive Faces: Emotionally aware facial animations
Natural Gestures: Context-appropriate head movements and expressions
High Fidelity: 4K resolution support with minimal artifacts

🛠️ Technical Capabilities

Multi-model architecture (SadTalker, Wav2Lip, Qwen)
GPU-accelerated processing
Batch processing support
Real-time preview
RESTful API for easy integration

🧩 Extensible Architecture

Modular design for easy model swapping
Plugin system for custom integrations
Support for custom voice models
Multi-language support
Precise Lip-Sync: Frame-accurate audio-visual synchronization using SadTalker
Expressive Faces: Emotionally aware facial animations with micro-expressions
Full-Body Gestures: Context-appropriate body language and hand movements
High Fidelity: 4K resolution support with DSLR-quality rendering

🛠️ Technical Capabilities

Multi-Model Architecture: Integrates Qwen LLM, SadTalker, and PantoMatrix
GPU-Accelerated: Optimized for NVIDIA GPUs with CUDA support
Modular Design: Swappable components for customization
High-Quality Output: 1080p+ resolution with advanced rendering
RESTful API: Easy integration with existing systems

🧩 Extensible Architecture

Modular Pipeline: Independent components for face, body, and voice
Custom Avatars: Support for 3D models and 2D images
Plugin System: Extend with custom models and effects
Multi-Language: Support for multiple languages and accents
Customization: Fine-tune animation styles and rendering parameters

🚀 Getting Started

Prerequisites

Python 3.8+
CUDA 11.3+ (for GPU acceleration)
ffmpeg 4.4+
8GB+ VRAM recommended
Python 3.9+ with pip
Node.js 16+ and npm 8+ (for web interface)
CUDA 11.8+ (for GPU acceleration)
ffmpeg 4.4+ for video processing
NVIDIA GPU with 16GB+ VRAM recommended
Docker (optional, for containerized deployment

Installation

Clone the repository:

git clone https://github.com/yourusername/paksatalker.git
cd paksatalker

Set up Python environment:

# Create and activate virtual environment
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On macOS/Linux:
# source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

Install AI Models:

# Download SadTalker models
python -c "from models.sadtalker import download_models; download_models()"

# Download PantoMatrix/EMAGE models
python -c "from models.gesture import download_models; download_models()"

# Download Qwen model weights (optional, can use API)
# python -c "from models.qwen import download_models; download_models()"

Set up frontend (for web interface):

cd frontend
npm install
npm run build
cd ..

🖥️ Quick Start

Command Line Interface

# Basic usage
python -m PaksaTalker.cli \
    --image input/face.jpg \
    --audio input/speech.wav \
    --output output/result.mp4 \
    --enhance_face True \
    --expression_intensity 0.8

# Advanced options
python -m PaksaTalker.cli \
    --image input/face.jpg \
    --audio input/speech.wav \
    --output output/result.mp4 \
    --resolution 1080 \
    --fps 30 \
    --background blur \
    --gesture_level medium

Python API

from PaksaTalker import PaksaTalker

### Generate a Talking Avatar from Text

```bash
# Generate speech and animate avatar from text
python -m cli.generate \
    --text "Hello, I'm your AI assistant. Welcome to PaksaTalker!" \
    --image assets/avatars/default.jpg \
    --voice en-US-JennyNeural \
    --output output/welcome.mp4 \
    --gesture-style natural \
    --resolution 1080

Animate with Custom Audio

# Animate avatar with existing audio
python -m cli.animate \
    --image assets/avatars/presenter.jpg \
    --audio input/presentation.wav \
    --output output/presentation.mp4 \
    --expression excited \
    --background blur \
    --lighting studio

Advanced Options

# Full pipeline with custom settings
python -m cli.pipeline \
    --prompt "Explain quantum computing in simple terms" \
    --avatar assets/avatars/scientist.jpg \
    --voice en-US-ChristopherNeural \
    --style professional \
    --gesture-level high \
    --output output/quantum_explainer.mp4 \
    --resolution 4k \
    --fps 30 \
    --enhance-face \
    --background office

🐍 Python API

Basic Usage

from paksatalker import Pipeline

# Initialize the pipeline
pipeline = Pipeline(
    model_dir="models",
    device="cuda"  # or "cpu" if no GPU
)

# Generate a talking avatar video
result = pipeline.generate(
    text="Welcome to PaksaTalker, the future of digital avatars.",
    image_path="assets/avatars/host.jpg",
    voice="en-US-JennyNeural",
    output_path="output/welcome.mp4",
    gesture_style="casual",
    resolution=1080
)

print(f"Video generated at: {result['output_path']}")

Advanced Usage

from paksatalker import (
    TextToSpeech,
    FaceAnimator,
    GestureGenerator,
    VideoRenderer
)

# Initialize components
tts = TextToSpeech(voice="en-US-ChristopherNeural")
animator = FaceAnimator(model_path="models/sadtalker")
gesture = GestureGenerator(model_path="models/pantomatrix")
renderer = VideoRenderer(resolution=1080, fps=30)

# Process pipeline
text = "Let me show you how this works..."
audio = tts.generate(text)
face_animation = animator.animate("assets/avatars/assistant.jpg", audio)
body_animation = gesture.generate(audio, style="presentation")

# Render final video
video = renderer.combine(
    face_animation=face_animation,
    body_animation=body_animation,
    audio=audio,
    output_path="output/demo.mp4"
)

from pathlib import Path

Initialize with custom settings

pt = PaksaTalker( device="cuda", # or "cpu" model_dir="models/", temp_dir="temp/" )

Generate video with enhanced settings

result = pt.generate( image_path="input/face.jpg", audio_path="input/speech.wav", output_path="output/result.mp4", config={ "resolution": 1080, "fps": 30, "expression_scale": 0.9, "head_pose": "natural", "background": { "type": "blur", "blur_strength": 0.7 }, "post_processing": { "denoise": True, "color_correction": True, "stabilization": True } } )


## 🏗️ Architecture

PaksaTalker/ ├── api/ # REST API endpoints │ ├── routes/ # API route definitions │ ├── schemas/ # Pydantic models │ └── utils/ # API utilities │ ├── config/ # Configuration management │ ├── init.py │ └── config.py │ ├── core/ # Core functionality │ ├── engine.py # Main processing pipeline │ ├── video.py # Video processing │ └── audio.py # Audio processing │ ├── integrations/ # Model integrations │ ├── sadtalker/ # SadTalker implementation │ ├── wav2lip/ # Wav2Lip integration │ ├── qwen/ # Qwen language model │ └── gesture/ # Gesture generation │ ├── models/ # Model architectures │ ├── base.py # Base model interface │ └── registry.py # Model registry │ ├── static/ # Static files │ ├── css/ │ ├── js/ │ └── templates/ │ ├── tests/ # Test suite │ ├── unit/ │ └── integration/ │ ├── utils/ # Utility functions │ ├── audio_utils.py │ ├── video_utils.py │ └── face_utils.py │ ├── app.py # Main application ├── cli.py # Command-line interface └── requirements.txt # Dependencies

🏃‍♂️ Usage

Development Mode

Start the development servers:
```
# In the project root directory
python run_dev.py
```
This will start:
- Frontend at http://localhost:5173
- Backend API at http://localhost:8000
- API Docs at http://localhost:8000/api/docs

Production Build

Build the frontend:
```
cd frontend
npm run build
cd ..
```
Start the production server:
```
uvicorn app:app --host 0.0.0.0 --port 8000
```
The application will be available at http://localhost:8000

Command Line (Direct API)

python app.py --input "Hello world" --output output/video.mp4

🔧 Configuration

PaksaTalker is highly configurable. Here's an example configuration:

# config/config.yaml
models:
  sadtalker:
    checkpoint: "models/sadtalker/checkpoints"
    config: "models/sadtalker/configs"

  wav2lip:
    checkpoint: "models/wav2lip/checkpoints"

  qwen:
    model_name: "Qwen/Qwen-7B-Chat"

processing:
  resolution: 1080
  fps: 30
  batch_size: 4
  device: "cuda"

api:
  host: "0.0.0.0"
  port: 8000
  workers: 4
  debug: false

🧪 Testing

Run the test suite:

# Install test dependencies
pip install -r requirements-test.txt

# Run tests
pytest tests/

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

📚 Documentation

For detailed documentation, please visit our Documentation.

Project Structure

paksatalker/
├── frontend/           # React + TypeScript frontend
│   ├── src/            # Source files
│   ├── public/         # Static files
│   └── package.json    # Frontend dependencies
├── api/                # API endpoints
├── config/             # Configuration files
├── models/             # AI models
├── static/             # Static files (served by FastAPI)
├── app.py              # Main application entry point
├── requirements.txt    # Python dependencies
└── README.md           # This file

Environment Variables

Create a .env file in the project root with the following variables:

# Backend
DEBUG=True
PORT=8000

# Database
DATABASE_URL=sqlite:///./paksatalker.db

# JWT
SECRET_KEY=your-secret-key
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30

For development, you can also create a .env.development file in the frontend directory.

API Documentation

Once the server is running, visit /api/docs for interactive API documentation (Swagger UI).

For detailed documentation, please visit our documentation website.

📧 Contact

Project Link: https://github.com/yourusername/paksatalker

🙏 Acknowledgments

SadTalker - For the amazing talking head generation
Wav2Lip - For lip-sync technology
Qwen - For advanced language modeling
All contributors and open-source maintainers who made this project possible

Quick Start (Stable Server)

Run the bundled stable server with fallbacks and background asset prefetch:

python stable_server.py
# API: http://localhost:8000
# Swagger UI: http://localhost:8000/api/docs

Optionally force asset ensure:

curl -X POST http://localhost:8000/api/v1/assets/ensure

Fusion Background & Green‑Screen

POST /api/v1/generate/fusion-video supports optional background parameters:

backgroundMode: none|blur|portrait|cinematic|color|image|greenscreen
backgroundColor: hex (for color/greenscreen)
backgroundImage: file (for image/greenscreen)
chromaColor, similarity, blend: green‑screen chroma key tuning

AI Style Suggestions (MVP)

Suggest presets matching your context:

curl -X POST http://localhost:8000/api/v1/style-presets/suggest \
  -F prompt="energetic keynote" -F cultural_context=GLOBAL -F formality=0.7

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
3DDFA_V2		3DDFA_V2
OpenSeeFace		OpenSeeFace
PaksaTalker		PaksaTalker
SadTalker		SadTalker
api		api
config		config
docs		docs
frontend		frontend
integrations		integrations
models		models
notebooks		notebooks
scripts		scripts
src		src
test_data		test_data
test_images		test_images
tests		tests
utils		utils
wav2lip2-aoti		wav2lip2-aoti
.dockerignore		.dockerignore
.env.development		.env.development
.env.development.local		.env.development.local
.env.production		.env.production
.gitattributes		.gitattributes
.gitignore		.gitignore
2.0.0		2.0.0
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__tmp_check.py		__tmp_check.py
app.py		app.py
awesome_gesture_generation.py		awesome_gesture_generation.py
build_frontend.py		build_frontend.py
check_dependencies.py		check_dependencies.py
check_integrations.py		check_integrations.py
cleanup.py		cleanup.py
config.py		config.py
core.py		core.py
create_test_image.py		create_test_image.py
debug.py		debug.py
docker-compose.yml		docker-compose.yml
download_emage_weights.py		download_emage_weights.py
fix_integrations.py		fix_integrations.py
generate_docs.py		generate_docs.py
install_frontend.py		install_frontend.py
install_real_sadtalker.py		install_real_sadtalker.py
install_sadtalker.py		install_sadtalker.py
minimal_sadtalker.py		minimal_sadtalker.py
optimize_sadtalker.py		optimize_sadtalker.py
package.json		package.json
perfect_sadtalker.py		perfect_sadtalker.py
plan.md		plan.md
real_sadtalker.py		real_sadtalker.py
real_server.py		real_server.py
real_video_generator.py		real_video_generator.py
requirements-dev.txt		requirements-dev.txt
requirements-full.txt		requirements-full.txt
requirements.txt		requirements.txt
run.bat		run.bat
run.ps1		run.ps1
setup_emage.py		setup_emage.py
setup_real_sadtalker.py		setup_real_sadtalker.py
setup_sadtalker.py		setup_sadtalker.py
simple_server.py		simple_server.py
simple_test.py		simple_test.py
stable_server.py		stable_server.py
start.bat		start.bat
start_dev.bat		start_dev.bat
start_dev.ps1		start_dev.ps1
start_frontend.bat		start_frontend.bat
start_production.bat		start_production.bat
start_server.bat		start_server.bat
start_server.py		start_server.py
temp_test_script.py		temp_test_script.py
test_3ddfa.py		test_3ddfa.py
test_3ddfa_fixed.py		test_3ddfa_fixed.py
test_api.py		test_api.py
test_audio.wav		test_audio.wav
test_emotion_detection.py		test_emotion_detection.py
test_fusion.py		test_fusion.py
test_fusion_complete.py		test_fusion_complete.py
test_import_paths.py		test_import_paths.py
test_imports.py		test_imports.py
test_imports_final.py		test_imports_final.py
test_imports_working.py		test_imports_working.py
test_openseeface.py		test_openseeface.py
test_openseeface_import.py		test_openseeface_import.py
test_output.mp4		test_output.mp4
test_perfect.py		test_perfect.py
test_sadtalker_fix.py		test_sadtalker_fix.py
test_simple.py		test_simple.py
test_video_generation.py		test_video_generation.py

License

paksaitsolutions/PaksaTalker

Folders and files

Latest commit

History

Repository files navigation

PaksaTalker: Advanced AI-Powered Talking Head Video Generation

🌟 Key Features

🎭 Natural Animation

🛠️ Technical Capabilities

🧩 Extensible Architecture

🛠️ Technical Capabilities

🧩 Extensible Architecture

🚀 Getting Started

Prerequisites

Installation

🖥️ Quick Start

Command Line Interface

Python API

Animate with Custom Audio

Advanced Options

🐍 Python API

Basic Usage

Advanced Usage

Initialize with custom settings

Generate video with enhanced settings

🏃‍♂️ Usage

Development Mode

Production Build

Command Line (Direct API)

🔧 Configuration

🧪 Testing

🤝 Contributing

📄 License

📚 Documentation

Project Structure

Environment Variables

API Documentation

📧 Contact

🙏 Acknowledgments

Quick Start (Stable Server)

Fusion Background & Green‑Screen

AI Style Suggestions (MVP)

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages