GoodGangLabs
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/vanilla-avatar/README.md‎
Lines changed: 9 additions & 8 deletions b/‎examples/vanilla-avatar/README.md‎
Lines changed: 9 additions & 8 deletions
diff --git a/‎examples/vanilla-avatar/index.html‎
Lines changed: 1316 additions & 487 deletions b/‎examples/vanilla-avatar/index.html‎
Lines changed: 1316 additions & 487 deletions
diff --git a/‎examples/vanilla-basic/README.md‎
Lines changed: 20 additions & 12 deletions b/‎examples/vanilla-basic/README.md‎
Lines changed: 20 additions & 12 deletions
@@ -130,8 +130,8 @@ Working examples you can run locally — zero npm install, all loaded from CDN.
 
 | Example | Description | Source |
 |---------|-------------|--------|
-| **[Basic](examples/vanilla-basic/)** | Audio → animated blendshape visualization. No 3D, pure API demo. | [index.html](examples/vanilla-basic/index.html) |
-| **[VRM Avatar](examples/vanilla-avatar/)** | Full 3D avatar — lip sync, expressions, body motion, mic streaming. | [index.html](examples/vanilla-avatar/index.html) |
+| **[V1 Avatar](examples/vanilla-basic/)** | Full 3D VRM avatar with V1 (111-dim phoneme engine). Dual mode, VAD, idle expressions. | [index.html](examples/vanilla-basic/index.html) |
+| **[V2 Avatar](examples/vanilla-avatar/)** | Full 3D VRM avatar with V2 (52-dim student model). Crisp mouth, direct prediction. | [index.html](examples/vanilla-avatar/index.html) |
 | **[V1 vs V2](examples/vanilla-comparison/)** | Side-by-side dual avatar comparison. Same voice, two animation engines. | [index.html](examples/vanilla-comparison/index.html) |
 
 **Run any example:**
 
@@ -1,14 +1,16 @@
-# Vanilla Avatar
+# Vanilla Avatar (V2)
 
-Full 3D VRM avatar that comes alive from voice alone. Lip sync, emotional facial expressions, natural eye blinks, and body motion — all generated from a single audio stream via AnimaSync V2.
+Full 3D VRM avatar driven by AnimaSync V2 — the 52-dim student model engine. Lip sync, facial expressions, natural eye blinks, and body motion — all generated from a single audio stream via direct blendshape prediction.
 
 ## What it demonstrates
 
-- **Lip sync**: Mouth shapes driven by voice phonemes
-- **Facial expressions**: Brows, cheeks, and eye area respond to vocal characteristics
-- **Eye animation**: Natural stochastic blinks injected automatically
+- **52-dim ARKit output**: Standard blendshape channels via student model direct prediction
+- **Lip sync**: Crisp mouth shapes with threshold-based sharpening
+- **Facial expressions**: Brows and eye area respond to vocal characteristics
+- **Eye animation**: Natural stochastic blinks injected by post-processing
 - **Body motion**: VRMA bone animation (idle breathing ↔ speaking pose crossfade)
-- Real-time mic streaming + batch file processing
+- **Post-processing**: crisp_mouth + fade_in_out + add_blinks pipeline
+- Real-time mic streaming + batch file processing + TTS
 - Three.js + `@pixiv/three-vrm` integration
 
 ## Run locally
@@ -24,11 +26,10 @@ python3 -m http.server 8080
 Drop any `.vrm` file onto the canvas. Free CC0 avatars are available at:
 
 - [VRoid Hub](https://hub.vroid.com/en/models?characterization=allow) — filter by "OK to use as-is"
-- [Mixamo](https://www.mixamo.com/) — for reference animations
 
 ## How it works
 
 1. Page loads → WASM + ONNX model initialized from CDN
 2. Drop a `.vrm` file → Three.js scene renders the avatar with idle breathing animation
-3. Upload audio or click Microphone → engine generates lip sync + expressions + blinks
+3. Upload audio or click Microphone → V2 engine generates direct lip sync + expressions + blinks
 4. All animation layers (face + body) applied to VRM at 30fps via frame queue
@@ -1,28 +1,36 @@
-# Vanilla Basic
+# Vanilla Basic (V1)
 
-Minimal AnimaSync example — no 3D avatar, no Three.js. Drop an audio file and see how voice drives lip sync, facial expression, and blink animation data in real time.
+Full 3D VRM avatar driven by AnimaSync V1 — the 111-dim phoneme-based engine. Lip sync, facial expressions (brows, cheeks, tongue), natural eye blinks, and body motion generated from voice via ONNX phoneme classification + viseme mapping.
 
 ## What it demonstrates
 
-- Loading `@goodganglabs/lipsync-wasm-v2` from CDN (zero `npm install`)
-- `processFile()` batch API — returns lip sync + expressions + blinks in one call
-- Visualizing 23 key ARKit channels: jaw, mouth, eyes, brows, cheeks
+- **111-dim blendshape output**: Full ARKit channels including tongue, cheeks, and brows
+- **Phoneme-based pipeline**: Voice → MFCC → Phoneme → Viseme → Blendshape
+- **Dual mode**: ONNX inference with heuristic fallback if ONNX fails
+- **IdleExpressionGenerator**: Natural eye blinks (2.5–4.5s cycle, double-blink 15%)
+- **VoiceActivityDetector**: Auto-switches idle ↔ speaking body pose
+- **OneEuroFilter**: Time-domain smoothing for natural motion
+- Real-time mic streaming + batch file processing + TTS
+- Three.js + `@pixiv/three-vrm` integration
 
 ## Run locally
 
 ```bash
-# Any static file server works
 npx serve .
 # or
 python3 -m http.server 8080
 ```
 
-Open `http://localhost:8080` (or the port your server shows).
+## VRM Avatar
 
-## How it works
+Drop any `.vrm` file onto the canvas. Free CC0 avatars are available at:
+
+- [VRoid Hub](https://hub.vroid.com/en/models?characterization=allow) — filter by "OK to use as-is"
 
-1. WASM + ONNX model load from jsdelivr CDN on page load
-2. Drop/select an audio file → `processFile()` returns all animation frames (lip sync + expressions + blinks)
-3. `requestAnimationFrame` loop plays frames at 30fps, showing how each facial channel responds to the voice
+## How it works
 
-No bundler, no framework, single HTML file.
+1. Page loads → WASM + ONNX model initialized from CDN
+2. Drop a `.vrm` file → Three.js scene renders the avatar with idle breathing animation
+3. Upload audio or click Microphone → V1 engine generates phoneme-based lip sync + expressions + blinks
+4. All animation layers (face + body) applied to VRM at 30fps via frame queue
+5. Body pose auto-transitions between idle and speaking based on voice activity detection