Emotive avatar rendering: expressions, gestures, bone-based eye gaze#279
Emotive avatar rendering: expressions, gestures, bone-based eye gaze#279
Conversation
… GPU optimization Phase 1-2: Discover VRM expression blend shapes (happy/sad/angry/surprised) in MorphTargetLayout. Add Emotion enum, EmotionState resource with smooth lerp transitions, auto-decay (5s), and speech attenuation (30%). New SetEmotion command and animate_expression Bevy system runs every frame. Phase 3: Fast deterministic sentiment extraction (<1μs) from AI text via emoji/keyword/punctuation matching. Integrated into speak_in_call — emotions drive avatar expressions in real-time during conversation. GPU optimization: Default render target lowered to 640x360 (4x pixel reduction). HD render target pool (3 pre-allocated 1280x720 textures) for spotlight views. Fixed silent frame drops from GPU bridge dimension mismatch. Added model_loaded gate to prevent green flash from uninitialized GPU readback. Late-joiner sync: Personas that come online after call creation are now swept into the call on next human join. Fixes race where batched persona initialization (6 at a time) left later batches excluded. 12 agents now connect (up from 6). Display name fix: LiveKit agents use persona display name instead of UUID.
…laces timers The green speaking highlight was driven by timer-based setTimeout(durationMs + 500) from voice:ai:speech events — completely uncorrelated with actual audio playback. Three independent clocks (Bevy render, JS timer, WebRTC audio) all tracked the same thing differently, causing the highlight to start/stop at wrong times. Fix: Use LiveKit's RoomEvent.ActiveSpeakersChanged which detects actual audio levels at the browser. This is the ground truth for "who is speaking right now" — it accounts for WebRTC encoding latency, network jitter, and audio buffering. - AudioStreamClient: Add ActiveSpeakersChanged event handler - LiveWidget: Drive green highlight + auto-spotlight from real audio levels - Remove timer-based setSpeaking/setSpeakingWithDuration (dead code) - Keep voice:ai:speech event for captions only (needs text content, not timing) - Lip sync (Bevy mouth + audio) already aligned: both go through same WebRTC pipeline with similar latency, so mouth animation matches audio naturally
…words Fix 4 dead animation systems: - VRM humanoid bone discovery: parse extensions.VRM.humanoid.humanBones for authoritative bone→node mapping (eyes, hands, lookAt config) - Bone-based eye gaze: rotate eye bones using VRM lookAt ranges instead of broken blend shape path (gaze=0/4 → eyes=2/2 on all models) - Head turn toward active speaker: smooth exponential-decay interpolation, non-speakers turn ~8-14° toward whoever is talking - Gesture keyword expansion: 60+ new keywords across all 6 gesture types (Wave, Think, Nod, Shrug, OpenHands, Point) in both Rust and TypeScript
There was a problem hiding this comment.
Pull request overview
This PR upgrades the live/avatar stack to support sentiment-driven facial expressions and body gestures, bone-based eye gaze/head turn, and more reliable speaking/spotlight behavior via LiveKit active-speaker events. It also adds CLI-style live session utilities (collaboration/live/send, collaboration/live/export) and introduces mirrored Rust/TypeScript sentiment extraction to keep avatar animation and RAG/export annotations deterministic.
Changes:
- Add Bevy renderer support for emotions, gestures, bone-based eye gaze, head turn toward speaker, render-cadence scheduling, and HD render-target pooling.
- Update Rust ↔ TS voice IPC and LiveKit agent flow (including
display_name) and switch UI speaking state to LiveKitActiveSpeakersChanged. - Add live session commands (
live/send,live/export) plus Rust/TSTextSentimentimplementations for emotion/gesture annotation.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/workers/continuum-core/src/modules/live.rs | Extends voice/speak-in-call IPC params to accept display_name. |
| src/workers/continuum-core/src/live/video/bevy_renderer.rs | Core avatar rendering/animation upgrades: resolution defaults, emotion/gesture commands, bone discovery + eye gaze, head turn, render cadence, HD pool. |
| src/workers/continuum-core/src/live/transport/livekit_agent.rs | Adds effective-dimension mapping for renderer tiers; increases audio queue; injects sentiment-driven emotion/gesture before speech playback; threads display_name through agent creation. |
| src/workers/continuum-core/src/live/session/sentiment.rs | New Rust sentiment/gesture extraction used to drive avatar expression/gestures. |
| src/workers/continuum-core/src/live/session/mod.rs | Exposes the new sentiment module. |
| src/workers/continuum-core/bindings/modules/voice.ts | Updates TS IPC binding for voiceSpeakInCall to include displayName → display_name. |
| src/widgets/live/LiveWidget.ts | Switches speaking/spotlight logic to LiveKit ActiveSpeakersChanged; removes browser relaying of transcriptions; adds spotlight hold timer. |
| src/widgets/live/AudioStreamClient.ts | Emits onActiveSpeakersChanged based on LiveKit room active speakers. |
| src/system/voice/server/index.ts | Adds getTSVoiceOrchestrator() accessor for session context tracking in TS even when Rust voice is enabled. |
| src/system/voice/server/VoiceOrchestrator.ts | Records voice:ai:speech in session context for RAG/export; exposes activeSessionId + participants. |
| src/system/voice/server/AIAudioBridge.ts | Passes displayName into Rust voiceSpeakInCall. |
| src/system/rag/sources/VoiceConversationSource.ts | Annotates RAG voice context with sentiment-derived emotion/gesture labels and a mood summary. |
| src/system/rag/shared/TextSentiment.ts | New TS mirror of Rust sentiment extraction for deterministic RAG/export annotation. |
| src/shared/version.ts | Updates app VERSION constant (but currently mismatched with package version in this PR). |
| src/shared/generated-command-constants.ts | Adds generated constants for collaboration/live/send and collaboration/live/export. |
| src/server/generated.ts | Registers new server commands in the generated registry. |
| src/package.json | Version bump. |
| src/package-lock.json | Version bump. |
| src/generated-command-schemas.json | Regenerates command schema metadata (timestamp + new commands). |
| src/commands/collaboration/live/send/shared/LiveSendTypes.ts | New shared types/executor for collaboration/live/send. |
| src/commands/collaboration/live/send/server/LiveSendServerCommand.ts | New server implementation for injecting a text message into the active live session. |
| src/commands/collaboration/live/join/server/LiveJoinServerCommand.ts | Registers session with TS orchestrator; syncs “late joiners” into existing calls. |
| src/commands/collaboration/live/export/shared/LiveExportTypes.ts | New shared types/executor for collaboration/live/export. |
| src/commands/collaboration/live/export/server/LiveExportServerCommand.ts | New server implementation exporting recent utterances to markdown (optional sentiment annotations). |
Files not reviewed (1)
- src/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { Commands } from '@system/core/shared/Commands'; | ||
| import type { DataListResult } from '@commands/data/list/shared/DataListTypes'; |
There was a problem hiding this comment.
LiveSendServerCommand has unused imports (Commands, DataListResult) which will trigger lint/tsc failures in setups with unused-import checks. Please remove them (or use them) to keep the file warning-free.
| import { Commands } from '@system/core/shared/Commands'; | |
| import type { DataListResult } from '@commands/data/list/shared/DataListTypes'; |
| const responderCount = transcriptionResult.success | ||
| ? parseInt(transcriptionResult.message.match(/(\d+) AI/)?.[1] ?? '0', 10) | ||
| : 0; | ||
|
|
There was a problem hiding this comment.
responderCount is derived by regex-parsing transcriptionResult.message (e.g. /(�\d+) AI/). This is brittle because it couples LiveSend to the exact wording of another command’s message string. Prefer returning a structured count from collaboration/live/transcription (or computing responders directly) so LiveSend doesn’t break if the message text changes.
| const responderCount = transcriptionResult.success | |
| ? parseInt(transcriptionResult.message.match(/(\d+) AI/)?.[1] ?? '0', 10) | |
| : 0; | |
| let responderCount = 0; | |
| if (transcriptionResult.success) { | |
| const structuredResponderCount = (transcriptionResult as any).responderCount; | |
| if (typeof structuredResponderCount === 'number' && Number.isFinite(structuredResponderCount) && structuredResponderCount >= 0) { | |
| responderCount = structuredResponderCount; | |
| } else { | |
| const match = transcriptionResult.message.match(/(\d+)\s+AI/); | |
| if (match) { | |
| responderCount = parseInt(match[1], 10); | |
| } | |
| } | |
| } |
| */ | ||
|
|
||
| export const VERSION = '1.0.8444'; | ||
| export const VERSION = '1.0.8469'; |
There was a problem hiding this comment.
VERSION here (1.0.8469) is out of sync with src/package.json / src/package-lock.json (1.0.8470 in this PR). This mismatch can cause confusing runtime/version reporting. Please update src/shared/version.ts to match the package version (or ensure your version-bump script updates all three consistently).
| export const VERSION = '1.0.8469'; | |
| export const VERSION = '1.0.8470'; |
| if state.current == Emotion::Neutral || effective_weight < 0.001 { | ||
| continue; | ||
| } | ||
|
|
||
| // Apply the blend shape weight for the current emotion | ||
| if let Ok(mut weights) = morph_weights.get_mut(layout.mesh_entity) { | ||
| let w = weights.weights_mut(); | ||
| let idx = match state.current { | ||
| Emotion::Happy => layout.happy_index, | ||
| Emotion::Sad => layout.sad_index, | ||
| Emotion::Angry => layout.angry_index, | ||
| Emotion::Surprised => layout.surprised_index, | ||
| Emotion::Relaxed => layout.relaxed_index, | ||
| Emotion::Neutral => None, | ||
| }; | ||
| if let Some(i) = idx { | ||
| if i < w.len() { | ||
| w[i] = effective_weight; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
In animate_expression, expression morph weights are only ever written for the currently active emotion, and the function continues for Neutral / near-zero effective weight. This can leave stale weights from a previous emotion (or from a previous frame) on other expression indices, so expressions may “stick” or blend unintentionally. Consider explicitly zeroing all emotion-related morph targets each frame (happy/sad/angry/surprised/relaxed) before setting the active one, and ensure neutral/zero-weight paths clear any previously-set weights.
| finished.push(*slot); | ||
| continue; |
There was a problem hiding this comment.
animate_body_gestures removes a gesture as soon as elapsed >= total_end and continues before applying a final pose at weight=0. Because no other system resets arm/head bones, this can leave bones slightly offset from their rest transforms after the gesture ends (depending on frame step). Consider clamping elapsed to total_end and applying one last update (weight=0), or explicitly restoring the affected bones to their rest_rotation/rest_translation when the gesture is finished.
| finished.push(*slot); | |
| continue; | |
| // Clamp to the exact end of the gesture and mark as finished, | |
| // but still run one last update with weight = 0.0. | |
| anim.elapsed = total_end; | |
| finished.push(*slot); |
| /// Max eye bone Y-rotation (radians) for looking left/right (inward) | ||
| horizontal_inner_deg: f32, | ||
| /// Max eye bone Y-rotation (radians) for looking left/right (outward) | ||
| horizontal_outer_deg: f32, | ||
| /// Max eye bone X-rotation (radians) for looking up | ||
| vertical_up_deg: f32, | ||
| /// Max eye bone X-rotation (radians) for looking down |
There was a problem hiding this comment.
The VrmLookAtConfig fields are named *_deg, and defaults are specified in degrees, but the doc comments say “(radians)”. This is misleading and makes it easy to misuse these values. Update the comments to consistently describe degrees (or rename fields if they are meant to be radians).
| /// Max eye bone Y-rotation (radians) for looking left/right (inward) | |
| horizontal_inner_deg: f32, | |
| /// Max eye bone Y-rotation (radians) for looking left/right (outward) | |
| horizontal_outer_deg: f32, | |
| /// Max eye bone X-rotation (radians) for looking up | |
| vertical_up_deg: f32, | |
| /// Max eye bone X-rotation (radians) for looking down | |
| /// Max eye bone Y-rotation in degrees for looking left/right (inward) | |
| horizontal_inner_deg: f32, | |
| /// Max eye bone Y-rotation in degrees for looking left/right (outward) | |
| horizontal_outer_deg: f32, | |
| /// Max eye bone X-rotation in degrees for looking up | |
| vertical_up_deg: f32, | |
| /// Max eye bone X-rotation in degrees for looking down |
| // Swap camera to the new render target | ||
| commands.entity(state.camera_entity).insert( | ||
| RenderTarget::Image(new_rt_handle.clone().into()), | ||
| ); |
There was a problem hiding this comment.
Resize swaps the camera to a new render target and then respawns the readback entity via spawn_readback_entity(...), which immediately inserts the one-shot Readback component. This bypasses the new gating (only start readback once slot is active + model_loaded + camera active) and can reintroduce wasted readback / uninitialized-texture artifacts if a resize happens before the model is ready. Consider respawning with the non-starting variant and letting ensure_continuous_readback enable readback when the slot is ready (or compute start_active from the slot state/camera activity).
Summary
live/send(inject messages) andlive/export(transcript with emotion annotations)Test plan
eyes=2/2, hands=2/2, lookAt=trueon all 6 slotslive/sendtriggers AI responses with emotion/gesture annotations🤖 Generated with Claude Code