Skip to content

Commit 8bcc0d9

Browse files
jcgglclaude
andcommitted
refactor: reframe as voice-driven avatar animation engine
Shift messaging from "lip sync" to full animation engine concept: - Voice → lip sync + facial expressions + eye animation + body motion - New "What AnimaSync Does" section with animation layer breakdown - Updated architecture diagram showing expression/blink/body layers - Richer V1/V2 comparison (expression depth, body motion, VAD) - Hero banner: "Voice-driven Avatar Animation" subtitle - Landing page: emotion-aware description - All example READMEs emphasize multi-layer animation output - Repo description updated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a4f7786 commit 8bcc0d9

File tree

9 files changed

+121
-84
lines changed

9 files changed

+121
-84
lines changed

README.md

Lines changed: 84 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
<br><br>
66

7-
**Real-time audio-to-blendshape lip sync for the browser.**
7+
**Voice-driven 3D avatar animation engine for the browser.**
88

9-
Rust/WASM engine that converts speech into ARKit-compatible facial animations at 30fps — entirely client-side.
9+
Extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — entirely client-side via Rust/WASM.
1010

1111
<br>
1212

@@ -36,33 +36,48 @@ Rust/WASM engine that converts speech into ARKit-compatible facial animations at
3636
<tr>
3737
<td width="50%">
3838

39-
**Browser-native WASM**<br>
40-
<sub>No server needed. Entire pipeline runs in the browser with near-native performance via Rust → WebAssembly compilation.</sub>
39+
**Voice → Full-body Animation**<br>
40+
<sub>Not just lip sync. Analyzes speech to generate lip movements, emotional facial expressions, eye blinks, and body poses — all from a single audio stream.</sub>
4141

42-
**ARKit-compatible Output**<br>
43-
<sub>Standard 52-dim or 111-dim blendshape weight arrays. Works with any 3D framework — Three.js, Babylon.js, Unity WebGL.</sub>
42+
**Emotion-aware Expressions**<br>
43+
<sub>Automatically maps vocal characteristics to facial expressions. Eyebrow raises, smile intensity, jaw dynamics, and blink patterns respond to how things are said, not just what is said.</sub>
4444

45-
**Built-in Bone Animation**<br>
46-
<sub>Embedded VRMA idle/speaking pose clips with automatic crossfade. Natural body movement out of the box.</sub>
45+
**Built-in Body Motion**<br>
46+
<sub>Embedded VRMA bone animation clips (idle / speaking poses) with automatic crossfade. Your avatar breathes, shifts weight, and moves naturally — out of the box.</sub>
4747

4848
</td>
4949
<td width="50%">
5050

51-
**Real-time Streaming**<br>
52-
<sub>AudioWorklet-based microphone capture with ~300ms latency. Stream TTS audio or process recorded files.</sub>
51+
**Browser-native WASM**<br>
52+
<sub>No server needed. Entire pipeline runs in the browser at 30fps with near-native performance via Rust → WebAssembly. ARKit-compatible 52 or 111-dim output.</sub>
5353

54-
**30-day Free Trial**<br>
55-
<sub>No signup, no API key. Call `init()` and start building. Internet required for license validation only.</sub>
54+
**Real-time Streaming**<br>
55+
<sub>AudioWorklet-based microphone capture with ~300ms latency. Feed live mic, TTS, or recorded audio — get animated avatar frames back instantly.</sub>
5656

57-
**Three.js + VRM Ready**<br>
58-
<sub>First-class integration with @pixiv/three-vrm. Drop a VRM avatar and it just works.</sub>
57+
**Plug & Play**<br>
58+
<sub>3 lines of code to go from audio to animated avatar. 30-day free trial, no signup. First-class Three.js + VRM integration.</sub>
5959

6060
</td>
6161
</tr>
6262
</table>
6363

6464
---
6565

66+
## What AnimaSync Does
67+
68+
Most lip sync engines stop at mouth shapes. AnimaSync goes further — it treats voice as the **complete animation source**:
69+
70+
| Layer | What it generates | How |
71+
|-------|-------------------|-----|
72+
| **Lip Sync** | Mouth shapes matching phonemes | ONNX inference → ARKit blendshapes (jaw, mouth, tongue) |
73+
| **Facial Expression** | Emotion-driven brows, cheeks, eyes | Voice energy & pitch → expression mapping + anatomical constraints |
74+
| **Eye Animation** | Natural blinks, micro-movements | Stochastic blink injection (2.5–4.5s intervals, 15% double-blink) |
75+
| **Body Motion** | Idle breathing, speaking gestures | Embedded VRMA bone clips with automatic idle ↔ speaking crossfade |
76+
77+
One audio stream in → a fully animated 3D avatar out.
78+
79+
---
80+
6681
## Quick Start
6782

6883
### Install
@@ -85,9 +100,10 @@ import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
85100
const lipsync = new LipSyncWasmWrapper();
86101
await lipsync.init(); // 30-day free trial — no key needed
87102

103+
// One call — get lip sync + expressions + blinks, all at once
88104
const result = await lipsync.processFile(audioFile);
89105
for (let i = 0; i < result.frame_count; i++) {
90-
const frame = lipsync.getFrame(result, i); // number[52] — ARKit blendshapes
106+
const frame = lipsync.getFrame(result, i); // number[52] — full face animation
91107
applyToYourAvatar(frame);
92108
}
93109
```
@@ -114,9 +130,9 @@ Working examples you can run locally — zero npm install, all loaded from CDN.
114130

115131
| Example | Description | Source |
116132
|---------|-------------|--------|
117-
| **[Basic](examples/vanilla-basic/)** | Audio file → blendshape bar chart. No 3D, pure API demo. | [index.html](examples/vanilla-basic/index.html) |
118-
| **[VRM Avatar](examples/vanilla-avatar/)** | Full 3D avatar with mic, file upload, bone animation. | [index.html](examples/vanilla-avatar/index.html) |
119-
| **[V1 vs V2](examples/vanilla-comparison/)** | Side-by-side dual avatar comparison. Same audio, two engines. | [index.html](examples/vanilla-comparison/index.html) |
133+
| **[Basic](examples/vanilla-basic/)** | Audio → animated blendshape visualization. No 3D, pure API demo. | [index.html](examples/vanilla-basic/index.html) |
134+
| **[VRM Avatar](examples/vanilla-avatar/)** | Full 3D avatar — lip sync, expressions, body motion, mic streaming. | [index.html](examples/vanilla-avatar/index.html) |
135+
| **[V1 vs V2](examples/vanilla-comparison/)** | Side-by-side dual avatar comparison. Same voice, two animation engines. | [index.html](examples/vanilla-comparison/index.html) |
120136

121137
**Run any example:**
122138

@@ -135,58 +151,76 @@ npx serve . # or: python3 -m http.server 8080
135151
| **Output** | 52-dim ARKit blendshapes | 111-dim ARKit blendshapes |
136152
| **Model** | Student distillation (direct prediction) | Phoneme classification → viseme mapping |
137153
| **Post-processing** | crisp_mouth + fade + auto-blink | OneEuroFilter + anatomical constraints |
138-
| **Idle expressions** | Not included | Built-in `IdleExpressionGenerator` |
139-
| **Voice activity** | Not included | Built-in `VoiceActivityDetector` |
154+
| **Expression generation** | Blink injection in post-process | Built-in `IdleExpressionGenerator` (blinks + micro-expressions) |
155+
| **Voice activity** | Not included | Built-in `VoiceActivityDetector` (body pose switching) |
140156
| **ONNX fallback** | None (ONNX required) | Heuristic mode (energy-based) |
157+
| **Body motion** | VRMA idle/speaking (both versions) | VRMA idle/speaking + VAD auto-switch |
141158
| **Best for** | Most projects, quick integration | Full expression control, custom avatars |
142159

143160
---
144161

145162
## Architecture
146163

147164
```
148-
┌────────────────────────────────────────────────────────────┐
149-
│ Browser │
150-
│ │
151-
│ Audio Source (File / Mic / TTS) │
152-
│ │ │
153-
│ ▼ │
154-
│ ┌──────────┐ ┌────────────┐ ┌─────────────────────┐ │
155-
│ │ WASM │ │ ONNX │ │ WASM │ │
156-
│ │ Feature │───▶│ Inference │───▶│ Post-processing │ │
157-
│ │ Extract │ │ (JS) │ │ + Blendshape map │ │
158-
│ └──────────┘ └────────────┘ └─────────┬───────────┘ │
159-
│ │ │
160-
│ ▼ │
161-
│ 52 / 111-dim ARKit │
162-
│ Blendshapes @30fps │
163-
│ │ │
164-
│ ▼ │
165-
│ 3D Avatar (Three.js, │
166-
│ Babylon, Unity WebGL) │
167-
└────────────────────────────────────────────────────────────┘
165+
┌─────────────────────────────────────────────────────────────────────┐
166+
│ Browser │
167+
│ │
168+
│ Audio Source (File / Mic / TTS) │
169+
│ │ │
170+
│ ▼ │
171+
│ ┌──────────┐ ┌────────────┐ ┌──────────────────────────────┐ │
172+
│ │ WASM │ │ ONNX │ │ WASM │ │
173+
│ │ Feature │───▶│ Inference │───▶│ Post-processing │ │
174+
│ │ Extract │ │ (JS) │ │ + Expression mapping │ │
175+
│ └──────────┘ └────────────┘ └────────────┬─────────────────┘ │
176+
│ │ │
177+
│ ┌─────────────────────────┼────────────┐ │
178+
│ │ │ │ │
179+
│ ▼ ▼ ▼ │
180+
│ Lip Sync Facial Expression Blinks │
181+
│ (jaw, mouth, (brows, cheeks, (natural │
182+
│ tongue) smile, frown) stochastic)│
183+
│ │ │ │ │
184+
│ └─────────────┬───────────┘ │ │
185+
│ ▼ │ │
186+
│ 52/111-dim ARKit Blendshapes @30fps │ │
187+
│ │ ◄─────────────────────┘ │
188+
│ ▼ │
189+
│ ┌──────────────────────────┐ │
190+
│ │ VRMA Bone Animation │ │
191+
│ │ idle ↔ speaking crossfade │ │
192+
│ │ (body pose + gestures) │ │
193+
│ └────────────┬─────────────┘ │
194+
│ ▼ │
195+
│ 3D Avatar (Three.js / Babylon / Unity) │
196+
└─────────────────────────────────────────────────────────────────────┘
168197
```
169198

170199
### V2 Pipeline
171200

172201
```
173202
Audio 16kHz PCM
174203
→ [WASM] librosa-compatible features: 141-dim @30fps
175-
→ [JS] ONNX student model: 52-dim direct output
176-
→ [WASM] crisp_mouth → fade_in_out → add_blinks
177-
→ [Optional] Preset blending
204+
→ [JS] ONNX student model → 52-dim (lip sync + expressions)
205+
→ [WASM] crisp_mouth (mouth sharpening) → fade_in_out (natural onset/offset)
206+
→ [WASM] add_blinks (stochastic eye animation)
207+
→ [WASM] Preset blending: expression channels (brows, eyes) blended with lip sync
208+
→ [VRMA] Bone animation: idle ↔ speaking pose auto-crossfade
178209
```
179210

180211
### V1 Pipeline
181212

182213
```
183214
Audio 16kHz PCM
184215
→ [WASM] MFCC extraction: 13-dim @100fps
185-
→ [JS] ONNX inference: 61 phoneme probabilities
186-
→ [WASM] Phoneme22 visemes → 111-dim ARKit blendshapes
216+
→ [JS] ONNX inference: 61 phoneme → 22 visemes
217+
→ [WASM] Viseme → 111-dim ARKit blendshapes (lip + expression + extras)
187218
→ [WASM] FPS conversion: 100fps → 30fps
188-
→ [WASM] Anatomical constraints + OneEuroFilter
189-
→ [Optional] Preset blending (face 40% + mouth 60%)
219+
→ [WASM] Anatomical constraints (bilateral symmetry + jaw correction)
220+
→ [WASM] OneEuroFilter (temporal smoothing for natural motion)
221+
→ [WASM] Preset blending: face 40% (expression) + mouth 60% (lip sync)
222+
→ [WASM] IdleExpressionGenerator: blinks (2.5–4.5s, 15% double) + micro-expressions
223+
→ [VRMA] Bone animation: idle ↔ speaking pose crossfade (VAD-triggered)
190224
```
191225

192226
---
@@ -241,10 +275,10 @@ interface ProcessResult {
241275

242276
| Method | Use Case |
243277
|--------|----------|
244-
| `processFile(file)` | File upload UI |
245-
| `processAudio(float32)` | Pre-loaded audio (fetched from API) |
278+
| `processFile(file)` | File upload → returns lip sync + expression + blink frames |
279+
| `processAudio(float32)` | Pre-loaded audio (e.g., fetched from TTS API) |
246280
| `processAudioChunk(chunk)` | Real-time mic / TTS streaming |
247-
| `getVrmaBytes()` | Bone animations for idle & speaking poses |
281+
| `getVrmaBytes()` | Bone animation clips for idle breathing & speaking gestures |
248282
| `reset()` | Clear streaming state between utterances |
249283

250284
### Loading Progress Stages

assets/readme/hero-banner.svg

Lines changed: 3 additions & 3 deletions
Loading

examples/vanilla-avatar/README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# Vanilla Avatar
22

3-
Full 3D VRM avatar that lip-syncs to audio using AnimaSync V2. Supports file upload and real-time microphone streaming.
3+
Full 3D VRM avatar that comes alive from voice alone. Lip sync, emotional facial expressions, natural eye blinks, and body motion — all generated from a single audio stream via AnimaSync V2.
44

55
## What it demonstrates
66

7-
- Three.js + `@pixiv/three-vrm` avatar rendering
8-
- VRMA bone animation (idle pose crossfade)
9-
- Real-time mic streaming via `processAudioChunk()` + AudioWorklet
10-
- Batch file processing via `processFile()`
11-
- 52-dim ARKit blendshape application to VRM expressions
7+
- **Lip sync**: Mouth shapes driven by voice phonemes
8+
- **Facial expressions**: Brows, cheeks, and eye area respond to vocal characteristics
9+
- **Eye animation**: Natural stochastic blinks injected automatically
10+
- **Body motion**: VRMA bone animation (idle breathing ↔ speaking pose crossfade)
11+
- Real-time mic streaming + batch file processing
12+
- Three.js + `@pixiv/three-vrm` integration
1213

1314
## Run locally
1415

@@ -28,6 +29,6 @@ Drop any `.vrm` file onto the canvas. Free CC0 avatars are available at:
2829
## How it works
2930

3031
1. Page loads → WASM + ONNX model initialized from CDN
31-
2. Drop a `.vrm` file → Three.js scene renders the avatar with idle bone animation
32-
3. Upload audio or click Microphone → blendshapes applied to VRM at 30fps
33-
4. Frame queue pattern: audio processing pushes frames, render loop consumes at 30fps
32+
2. Drop a `.vrm` file → Three.js scene renders the avatar with idle breathing animation
33+
3. Upload audio or click Microphone → engine generates lip sync + expressions + blinks
34+
4. All animation layers (face + body) applied to VRM at 30fps via frame queue

examples/vanilla-avatar/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="UTF-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6-
<title>AnimaSync — VRM Avatar</title>
6+
<title>AnimaSync — Voice-driven VRM Avatar</title>
77
<script type="importmap">
88
{ "imports": {
99
"three": "https://cdn.jsdelivr.net/npm/three@0.179.1/build/three.module.js",

examples/vanilla-basic/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Vanilla Basic
22

3-
Minimal AnimaSync example — no 3D avatar, no Three.js. Drop an audio file and watch blendshape values animate in real time.
3+
Minimal AnimaSync example — no 3D avatar, no Three.js. Drop an audio file and see how voice drives lip sync, facial expression, and blink animation data in real time.
44

55
## What it demonstrates
66

77
- Loading `@goodganglabs/lipsync-wasm-v2` from CDN (zero `npm install`)
8-
- `processFile()` batch API
9-
- Extracting frames with `getFrame()` and visualizing 23 key ARKit channels
8+
- `processFile()` batch API — returns lip sync + expressions + blinks in one call
9+
- Visualizing 23 key ARKit channels: jaw, mouth, eyes, brows, cheeks
1010

1111
## Run locally
1212

@@ -22,7 +22,7 @@ Open `http://localhost:8080` (or the port your server shows).
2222
## How it works
2323

2424
1. WASM + ONNX model load from jsdelivr CDN on page load
25-
2. Drop/select an audio file → `processFile()` returns all frames at once
26-
3. `requestAnimationFrame` loop plays frames at 30fps, updating bar widths
25+
2. Drop/select an audio file → `processFile()` returns all animation frames (lip sync + expressions + blinks)
26+
3. `requestAnimationFrame` loop plays frames at 30fps, showing how each facial channel responds to the voice
2727

2828
No bundler, no framework, single HTML file.

examples/vanilla-basic/index.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="UTF-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6-
<title>AnimaSync — Basic Example</title>
6+
<title>AnimaSync — Basic: Voice-driven Animation Data</title>
77
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.17.0/dist/ort.min.js"></script>
88
<style>
99
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
@@ -142,7 +142,7 @@ <h2>Audio Input</h2>
142142

143143
<!-- Right: Blendshapes -->
144144
<div class="card">
145-
<h2>ARKit Blendshapes (52-dim)</h2>
145+
<h2>Face Animation Data (52-dim)</h2>
146146
<div class="bs-grid" id="bs-grid"></div>
147147
</div>
148148
</main>
@@ -155,7 +155,7 @@ <h2>ARKit Blendshapes (52-dim)</h2>
155155
<script type="module">
156156
// ================================================================
157157
// AnimaSync — Vanilla Basic Example
158-
// No 3D avatar, no Three.js. Pure audio → blendshape visualization.
158+
// No 3D avatar, no Three.js. Pure audio → lip sync + expression + blink data.
159159
// ================================================================
160160

161161
const VERSION = '0.3.9';

0 commit comments

Comments
 (0)