基于 LiveTalking 的实时交互数字人平台,提供完整 Web UI、名人聊天、多引擎支持。
- 实时数字人 — Wav2Lip 驱动口型,25fps 实时渲染,WebRTC 推流
- 名人聊天 — 沉浸式全屏对话,预置爱因斯坦/斯大林,支持自定义名人
- 语音输入 — 浏览器端 ASR(Web Speech API),按 Space 键说话
- 8 种 TTS 引擎 — Kokoro 本地低延迟(175ms) / EdgeTTS / GPT-SoVITS / CosyVoice 等
- 5 种 LLM — 通义千问 / OpenAI / DeepSeek / Ollama(本地) / 自定义
- 形象管理 — 上传视频自动生成数字人形象,实时进度追踪
- 视频录制 — 录制数字人视频,下载管理
- 动作编排 — 空闲/回复/思考/离开 4 种状态切换
实时预览数字人、文本/音频驱动、录制、TTS/模型选择
上传视频生成形象、实时进度、人脸裁剪预览、搜索过滤
沉浸式全屏,AI 角色对话,语音输入,悬停显示 UI
录制视频管理、筛选排序、批量操作、预览下载
服务器/TTS/LLM/系统配置、GPU 信息、测试连接
| 项目 | 要求 |
|---|---|
| Python | 3.10+ |
| CUDA | 12.0+ |
| GPU 显存 | 4GB+ |
| Node.js | 18+ |
cd LiveTalking
python -m venv venv
source venv/Scripts/activate # Windows
pip install torch==2.5.0 torchaudio==2.5.0 torchvision==0.20.0 \
--index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install kokoro soundfile ordered-set "misaki[zh]"从 夸克网盘 或 Google Drive 下载:
wav2lip256.pth→ 重命名为wav2lip.pth放到models/wav2lip256_avatar1.tar.gz→ 解压到data/avatars/
PYTHONPATH=. python avatars/wav2lip/genavatar.py \
--video_path your_video.mp4 --avatar_id your_avatar \
--img_size 256 --face_det_batch_size 4python app.py --transport webrtc --model wav2lip \
--avatar_id your_avatar --modelres 256 --tts kokorocd livetalking-ui
npm install
npx vite --port 3000| 厂商 | 模型 | 配置方式 |
|---|---|---|
| 通义千问 | qwen-plus / qwen-turbo | export DASHSCOPE_API_KEY=sk-xxx |
| OpenAI | gpt-4o / gpt-4o-mini | export OPENAI_API_KEY=sk-xxx |
| DeepSeek | deepseek-chat | export DEEPSEEK_API_KEY=sk-xxx |
| Ollama | qwen2.5:7b / llama3:8b | 无需 Key,本地运行 |
| 自定义 | 任意 OpenAI 兼容 | 设置页面填写 |
| 指标 | 数值 | 说明 |
|---|---|---|
| Wav2Lip 推理 | 65 fps | 远超实时 |
| 视频输出 | 25 fps | 稳定达标 |
| Kokoro TTS 延迟 | 175 ms | 本地推理 |
| EdgeTTS 延迟 | ~2,400 ms | 在线服务 |
| TTS 加速比 | 14x | Kokoro vs EdgeTTS |
| GPU 显存 | ~2.3 GB | RTX 4060 |
A full-featured real-time digital human platform based on LiveTalking, with complete Web UI, celebrity chat, and multi-engine support.
- Real-time Avatar — Wav2Lip driven lip-sync, 25fps rendering, WebRTC streaming
- Celebrity Chat — Immersive full-screen AI persona dialogue with Einstein/Stalin presets
- Voice Input — Browser ASR (Web Speech API), press Space to talk
- 8 TTS Engines — Kokoro local 175ms / EdgeTTS / GPT-SoVITS / CosyVoice etc.
- 5 LLM Providers — Qwen / OpenAI / DeepSeek / Ollama (local) / Custom
- Avatar Management — Upload video to auto-generate digital human with progress tracking
- Video Recording — Record, preview, and download avatar videos
- Action Choreography — Idle / Reply / Think / Leave state switching
Live preview, text/audio driven, recording, TTS/model selection
Upload video to generate avatar, real-time progress, face-crop preview, search
Immersive full-screen, AI persona chat, voice input, hover to reveal UI
Recorded video management, filter/sort, batch operations, preview/download
Server/TTS/LLM/System config, GPU info, test connection
| Item | Requirement |
|---|---|
| Python | 3.10+ |
| CUDA | 12.0+ |
| GPU VRAM | 4GB+ |
| Node.js | 18+ |
cd LiveTalking
python -m venv venv
source venv/Scripts/activate # Windows
# source venv/bin/activate # Linux/Mac
pip install torch==2.5.0 torchaudio==2.5.0 torchvision==0.20.0 \
--index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install kokoro soundfile ordered-set "misaki[zh]"From Quark Drive or Google Drive:
wav2lip256.pth→ rename towav2lip.pth→models/wav2lip256_avatar1.tar.gz→ extract todata/avatars/
PYTHONPATH=. python avatars/wav2lip/genavatar.py \
--video_path your_video.mp4 --avatar_id your_avatar \
--img_size 256 --face_det_batch_size 4python app.py --transport webrtc --model wav2lip \
--avatar_id your_avatar --modelres 256 --tts kokorocd livetalking-ui
npm install
npx vite --port 3000| Provider | Models | Setup |
|---|---|---|
| Qwen | qwen-plus / qwen-turbo | export DASHSCOPE_API_KEY=sk-xxx |
| OpenAI | gpt-4o / gpt-4o-mini | export OPENAI_API_KEY=sk-xxx |
| DeepSeek | deepseek-chat | export DEEPSEEK_API_KEY=sk-xxx |
| Ollama | qwen2.5:7b / llama3:8b | No key needed, runs locally |
| Custom | Any OpenAI-compatible | Fill in Settings page |
| Metric | Value | Note |
|---|---|---|
| Wav2Lip Inference | 65 fps | Well above real-time |
| Video Output | 25 fps | Stable |
| Kokoro TTS Latency | 175 ms | Local inference |
| EdgeTTS Latency | ~2,400 ms | Online service |
| TTS Speedup | 14x | Kokoro vs EdgeTTS |
| GPU VRAM | ~2.3 GB | RTX 4060 Laptop |
┌─────────────────────────────────────────────────────────┐
│ Browser 浏览器 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │Workbench │ │ Avatars │ │Celebrity │ │Settings │ │
│ │ 工作台 │ │ 形象管理 │ │ 名人聊天 │ │ 设置 │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │
│ └──────────────┴─────────────┴──────────────┘ │
│ Hooks: useWebRTC / useSpeaking / useASR │
│ API Layer: api.ts │
└─────────────────────────┬────────────────────────────────┘
│ WebRTC + REST API
┌─────────────────────────┼────────────────────────────────┐
│ Backend 后端 (:8010) │
│ ┌───────┐ ┌─────┐ ┌─────────┐ ┌───────────┐ │
│ │Wav2Lip│ │ TTS │ │ LLM │ │ Avatar │ │
│ │ 65fps │ │175ms│ │Qwen/GPT │ │ Manager │ │
│ └───────┘ └─────┘ └─────────┘ └───────────┘ │
└──────────────────────────────────────────────────────────┘
├── LiveTalking/ # Backend
│ ├── app.py # Main entry
│ ├── llm.py # Multi-provider LLM
│ ├── server/
│ │ ├── avatar_api.py # Avatar CRUD + progress
│ │ ├── celebrity_api.py # Celebrity persona chat
│ │ ├── system_api.py # GPU info + video list
│ │ └── llm_api.py # LLM config API
│ └── tts/
│ └── kokoro_tts.py # Local TTS (175ms)
│
├── livetalking-ui/ # Frontend
│ └── src/
│ ├── App.tsx # Router + Workbench
│ ├── hooks/ # useWebRTC, useSpeaking, useASR
│ ├── services/api.ts # Unified API layer
│ └── pages/ # 5 pages
│
├── assets/ # Screenshots
└── TEST_REPORT.md # 115/115 tests passed
- LiveTalking — Real-time digital human engine
- Kokoro-82M — Local TTS engine
- Ant Design — UI component library
Apache 2.0



