Skip to content

Commit 64d0e06

Browse files
jcgglclaude
andcommitted
feat: bump V2 to 0.4.10 with streaming ONNX and reInferWithEmotion()
- V2 CDN 0.4.8 → 0.4.10 across all examples and configs - Guide emotion sliders now use reInferWithEmotion() for real-time emotion changes during file playback (debounced 300ms) - Mic streaming uses setEmotion() only (no reInfer to avoid LSTM reset) - API Reference: add reInferWithEmotion() with usage notes and caveats - V2 pipeline docs: streaming ONNX (UniLSTM + CausalTransformer + FiLM), 5-frame chunks (~167ms), LSTM state carry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c44bf88 commit 64d0e06

File tree

7 files changed

+38
-11
lines changed

7 files changed

+38
-11
lines changed

.well-known/agent-card.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"name": "AnimaSync",
44
"description": "Voice-driven 3D avatar animation engine for the browser. Extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — entirely client-side via Rust/WASM and ONNX inference.",
55
"url": "https://animasync.quasar.ggls.dev/",
6-
"version": "0.4.8",
6+
"version": "0.4.10",
77
"provider": {
88
"organization": "GoodGang Labs",
99
"url": "https://goodganglabs.com"

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,9 @@ The production site is available at **[animasync.quasar.ggls.dev](https://animas
221221
```
222222
Audio 16kHz PCM
223223
→ [WASM] librosa-compatible features: 141-dim @30fps
224-
→ [JS] ONNX emotion model + FiLM conditioning → 52-dim (lip sync + expressions)
224+
→ [JS] Streaming ONNX (UniLSTM + CausalTransformer + FiLM) → 52-dim
225+
Inputs: features + 5-dim emotion + LSTM h/c + conv context
226+
Chunk size: 5 frames (~167ms), state carried between chunks
225227
→ [WASM] crisp_mouth (mouth sharpening) → fade_in_out (natural onset/offset)
226228
→ [WASM] add_blinks (stochastic eye animation)
227229
→ [WASM] Preset blending: expression channels (brows, eyes) blended with lip sync

agents.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@
150150
},
151151
{
152152
"action": "init-animasync",
153-
"code": "<script type=\"module\">\nconst CDN = 'https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@0.4.8';\nconst { LipSyncWasmWrapper } = await import(`${CDN}/lipsync-wasm-wrapper.js`);\nconst lipsync = new LipSyncWasmWrapper({ wasmPath: `${CDN}/lipsync_wasm_v2.js` });\nawait lipsync.init();\n</script>",
153+
"code": "<script type=\"module\">\nconst CDN = 'https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@0.4.10';\nconst { LipSyncWasmWrapper } = await import(`${CDN}/lipsync-wasm-wrapper.js`);\nconst lipsync = new LipSyncWasmWrapper({ wasmPath: `${CDN}/lipsync_wasm_v2.js` });\nawait lipsync.init();\n</script>",
154154
"description": "Import and initialize AnimaSync V2 from CDN"
155155
}
156156
]

examples/guide/index.html

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -953,7 +953,7 @@ <h2 class="step-title">Add Real-time Microphone</h2>
953953
// Config
954954
// ════════════════════════════════════════
955955
const VERSION_V1 = '0.4.5';
956-
const VERSION_V2 = '0.4.8';
956+
const VERSION_V2 = '0.4.10';
957957
const CDN_V1 = `https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v1@${VERSION_V1}`;
958958
const CDN_V2 = `https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@${VERSION_V2}`;
959959
let selectedEngine = 'v1';
@@ -1207,10 +1207,22 @@ <h2 class="step-title">Add Real-time Microphone</h2>
12071207
const emotionSliders = EMOTION_KEYS.map(k => $(`emo-${k}`));
12081208
const emotionVals = EMOTION_KEYS.map(k => $(`emo-${k}-val`));
12091209

1210+
let _reInferTimer = null;
12101211
function updateEmotionVector() {
12111212
const vec = emotionSliders.map(s => parseInt(s.value) / 100);
1212-
if (lipsync?.setEmotion) {
1213-
try { lipsync.setEmotion(vec); } catch (e) { console.warn('setEmotion:', e.message); }
1213+
if (!lipsync?.setEmotion) return;
1214+
try { lipsync.setEmotion(vec); } catch (e) { console.warn('setEmotion:', e.message); return; }
1215+
1216+
// File playback: debounced re-inference with new emotion (300ms)
1217+
// Mic streaming: setEmotion() above is enough — each chunk uses current vector
1218+
if (filePlaying && fileResult && lipsync.reInferWithEmotion && !micActive) {
1219+
clearTimeout(_reInferTimer);
1220+
_reInferTimer = setTimeout(async () => {
1221+
try {
1222+
const newResult = await lipsync.reInferWithEmotion();
1223+
if (filePlaying) fileResult = newResult;
1224+
} catch (e) { console.warn('reInferWithEmotion:', e.message); }
1225+
}, 300);
12141226
}
12151227
}
12161228

examples/vanilla-avatar/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ <h2>52 ARKit Blendshapes — V2 Emotion</h2>
199199
// No 3D avatar, no Three.js. Pure audio → lip sync data (52-dim).
200200
// ================================================================
201201

202-
const VERSION = '0.4.8';
202+
const VERSION = '0.4.10';
203203
const CDN = `https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@${VERSION}`;
204204

205205
// ── All 52 ARKit blendshape channels ──

examples/vanilla-comparison/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ <h1>Anima<span>Sync</span></h1>
254254
// Config
255255
// ================================================================
256256
const VERSION_V1 = '0.4.5';
257-
const VERSION_V2 = '0.4.8';
257+
const VERSION_V2 = '0.4.10';
258258
const CDN_V1 = `https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v1@${VERSION_V1}`;
259259
const CDN_V2 = `https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@${VERSION_V2}`;
260260

llms-full.txt

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,17 @@ lipsync.setEmotion([0, 0.8, 0, 0, 0]); // 80% joy
138138

139139
Returns the current 5-dim emotion vector.
140140

141+
#### `reInferWithEmotion(emotionVec?: number[]): Promise<ProcessResult>` (V2 only, v0.4.10+)
142+
143+
Re-run ONNX inference on cached audio features with a new emotion vector, without re-uploading or re-decoding audio. Requires a prior `processFile()`/`processAudio()` call (uses internally cached features). Do NOT call during mic streaming — it resets LSTM state.
144+
145+
```javascript
146+
const result = await lipsync.processFile(audioFile);
147+
// Later, change emotion without re-uploading:
148+
const joyResult = await lipsync.reInferWithEmotion([0, 1.0, 0, 0, 0]);
149+
const angryResult = await lipsync.reInferWithEmotion([0, 0, 0.8, 0, 0]);
150+
```
151+
141152
#### `reset(): void`
142153

143154
Clear streaming state. Call between utterances when using `processAudioChunk`.
@@ -171,7 +182,7 @@ interface ProcessResult {
171182
| VRM mode | getVrmFrame() + convert_arkit_to_vrm() for VRM 18-dim | getVrmFrame() for VRM 18-dim |
172183
| Voice activity | Built-in VoiceActivityDetector | Not included |
173184
| ONNX fallback | Heuristic mode (energy-based) | None (ONNX required) |
174-
| Emotion control | Not included | 5-dim FiLM conditioning (neutral, joy, anger, sadness, surprise) via setEmotion()/getEmotion() |
185+
| Emotion control | Not included | 5-dim FiLM conditioning (neutral, joy, anger, sadness, surprise) via setEmotion()/getEmotion()/reInferWithEmotion() |
175186
| Body motion | VRMA idle/speaking + VAD auto-switch (LoopPingPong, asymmetric crossfade) | VRMA idle/speaking (LoopPingPong, asymmetric crossfade 0.8s/1.0s) |
176187
| Best for | Full expression control, custom avatars | Emotion-aware lip sync, quick integration |
177188

@@ -184,7 +195,9 @@ interface ProcessResult {
184195
```
185196
Audio 16kHz PCM
186197
-> [WASM] librosa-compatible features: 141-dim @30fps
187-
-> [JS] ONNX emotion model + FiLM conditioning -> 52-dim (lip sync + expressions)
198+
-> [JS] Streaming ONNX (UniLSTM + CausalTransformer + FiLM) -> 52-dim
199+
Inputs: features + 5-dim emotion + LSTM h/c + conv context
200+
Chunk size: 5 frames (~167ms), state carried between chunks
188201
-> [WASM] crisp_mouth (mouth sharpening) -> fade_in_out (natural onset/offset)
189202
-> [WASM] add_blinks (stochastic eye animation)
190203
-> [WASM] Preset blending: expression channels blended with lip sync
@@ -224,7 +237,7 @@ Tongue: tongueOut
224237

225238
| Example | Description | URL |
226239
|---------|-------------|-----|
227-
| Step-by-Step Guide | 6-step interactive tutorial with V1/V2 engine selector, V2 emotion control panel (5 sliders + presets), VRM mode auto-detect, idle eye blink, audio-synced playback, LoopPingPong idle, asymmetric crossfade (V1 0.4.5, V2 0.4.8) | https://animasync.quasar.ggls.dev/examples/guide/ |
240+
| Step-by-Step Guide | 6-step interactive tutorial with V1/V2 engine selector, V2 emotion control panel (5 sliders + presets), VRM mode auto-detect, idle eye blink, audio-synced playback, LoopPingPong idle, asymmetric crossfade (V1 0.4.5, V2 0.4.10) | https://animasync.quasar.ggls.dev/examples/guide/ |
228241
| V1 Data | V1 phoneme engine — 52 ARKit blendshapes visualization | https://animasync.quasar.ggls.dev/examples/vanilla-basic/ |
229242
| V2 Data | V2 emotion model — 52 ARKit with 5-dim FiLM conditioning | https://animasync.quasar.ggls.dev/examples/vanilla-avatar/ |
230243
| V1 vs V2 | Side-by-side dual avatar comparison | https://animasync.quasar.ggls.dev/examples/vanilla-comparison/ |

0 commit comments

Comments
 (0)