Skip to content

Emotive avatar rendering: expressions, gestures, bone-based eye gaze#279

Merged
joelteply merged 4 commits intomainfrom
feature/avatar-emotive-rendering
Mar 2, 2026
Merged

Emotive avatar rendering: expressions, gestures, bone-based eye gaze#279
joelteply merged 4 commits intomainfrom
feature/avatar-emotive-rendering

Conversation

@joelteply
Copy link
Contributor

Summary

  • Emotion blend shapes: Happy, Sad, Angry, Surprised, Relaxed expressions driven by sentiment analysis of AI speech text (VRM presets discovered automatically)
  • 6 body gestures: Wave, Think, Nod, Shrug, Point, OpenHands — arm bone animations triggered by keyword matching in speech content
  • Bone-based eye gaze: VRM humanoid bone discovery replaces broken blend shape lookAt (all models use bone-type eyes). Eye bones rotate using VRM lookAt range config.
  • Head turn toward speaker: Non-speaking avatars smoothly turn head toward active speaker (~8-14°), exponential-decay interpolation
  • Speaking sync: ActiveSpeakersChanged events from LiveKit replace timer-based speaking indicators
  • Live commands: live/send (inject messages) and live/export (transcript with emotion annotations)
  • TextSentiment: Rust + TypeScript mirror implementations for deterministic sentiment extraction
  • 60+ gesture keywords across both languages for higher trigger rate

Test plan

  • All 14 Rust sentiment tests pass
  • Bone discovery logs: eyes=2/2, hands=2/2, lookAt=true on all 6 slots
  • Visual verification: head turn toward speaker observed in live call
  • Clean build: zero warnings in both dev and release mode
  • live/send triggers AI responses with emotion/gesture annotations

🤖 Generated with Claude Code

… GPU optimization

Phase 1-2: Discover VRM expression blend shapes (happy/sad/angry/surprised) in
MorphTargetLayout. Add Emotion enum, EmotionState resource with smooth lerp
transitions, auto-decay (5s), and speech attenuation (30%). New SetEmotion
command and animate_expression Bevy system runs every frame.

Phase 3: Fast deterministic sentiment extraction (<1μs) from AI text via
emoji/keyword/punctuation matching. Integrated into speak_in_call — emotions
drive avatar expressions in real-time during conversation.

GPU optimization: Default render target lowered to 640x360 (4x pixel reduction).
HD render target pool (3 pre-allocated 1280x720 textures) for spotlight views.
Fixed silent frame drops from GPU bridge dimension mismatch. Added model_loaded
gate to prevent green flash from uninitialized GPU readback.

Late-joiner sync: Personas that come online after call creation are now swept
into the call on next human join. Fixes race where batched persona initialization
(6 at a time) left later batches excluded. 12 agents now connect (up from 6).

Display name fix: LiveKit agents use persona display name instead of UUID.
…laces timers

The green speaking highlight was driven by timer-based setTimeout(durationMs + 500)
from voice:ai:speech events — completely uncorrelated with actual audio playback.
Three independent clocks (Bevy render, JS timer, WebRTC audio) all tracked the
same thing differently, causing the highlight to start/stop at wrong times.

Fix: Use LiveKit's RoomEvent.ActiveSpeakersChanged which detects actual audio
levels at the browser. This is the ground truth for "who is speaking right now" —
it accounts for WebRTC encoding latency, network jitter, and audio buffering.

- AudioStreamClient: Add ActiveSpeakersChanged event handler
- LiveWidget: Drive green highlight + auto-spotlight from real audio levels
- Remove timer-based setSpeaking/setSpeakingWithDuration (dead code)
- Keep voice:ai:speech event for captions only (needs text content, not timing)
- Lip sync (Bevy mouth + audio) already aligned: both go through same WebRTC
  pipeline with similar latency, so mouth animation matches audio naturally
…words

Fix 4 dead animation systems:
- VRM humanoid bone discovery: parse extensions.VRM.humanoid.humanBones
  for authoritative bone→node mapping (eyes, hands, lookAt config)
- Bone-based eye gaze: rotate eye bones using VRM lookAt ranges instead
  of broken blend shape path (gaze=0/4 → eyes=2/2 on all models)
- Head turn toward active speaker: smooth exponential-decay interpolation,
  non-speakers turn ~8-14° toward whoever is talking
- Gesture keyword expansion: 60+ new keywords across all 6 gesture types
  (Wave, Think, Nod, Shrug, OpenHands, Point) in both Rust and TypeScript
Copilot AI review requested due to automatic review settings March 2, 2026 05:03
@joelteply joelteply merged commit ca4946c into main Mar 2, 2026
2 of 5 checks passed
@joelteply joelteply deleted the feature/avatar-emotive-rendering branch March 2, 2026 05:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the live/avatar stack to support sentiment-driven facial expressions and body gestures, bone-based eye gaze/head turn, and more reliable speaking/spotlight behavior via LiveKit active-speaker events. It also adds CLI-style live session utilities (collaboration/live/send, collaboration/live/export) and introduces mirrored Rust/TypeScript sentiment extraction to keep avatar animation and RAG/export annotations deterministic.

Changes:

  • Add Bevy renderer support for emotions, gestures, bone-based eye gaze, head turn toward speaker, render-cadence scheduling, and HD render-target pooling.
  • Update Rust ↔ TS voice IPC and LiveKit agent flow (including display_name) and switch UI speaking state to LiveKit ActiveSpeakersChanged.
  • Add live session commands (live/send, live/export) plus Rust/TS TextSentiment implementations for emotion/gesture annotation.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/workers/continuum-core/src/modules/live.rs Extends voice/speak-in-call IPC params to accept display_name.
src/workers/continuum-core/src/live/video/bevy_renderer.rs Core avatar rendering/animation upgrades: resolution defaults, emotion/gesture commands, bone discovery + eye gaze, head turn, render cadence, HD pool.
src/workers/continuum-core/src/live/transport/livekit_agent.rs Adds effective-dimension mapping for renderer tiers; increases audio queue; injects sentiment-driven emotion/gesture before speech playback; threads display_name through agent creation.
src/workers/continuum-core/src/live/session/sentiment.rs New Rust sentiment/gesture extraction used to drive avatar expression/gestures.
src/workers/continuum-core/src/live/session/mod.rs Exposes the new sentiment module.
src/workers/continuum-core/bindings/modules/voice.ts Updates TS IPC binding for voiceSpeakInCall to include displayNamedisplay_name.
src/widgets/live/LiveWidget.ts Switches speaking/spotlight logic to LiveKit ActiveSpeakersChanged; removes browser relaying of transcriptions; adds spotlight hold timer.
src/widgets/live/AudioStreamClient.ts Emits onActiveSpeakersChanged based on LiveKit room active speakers.
src/system/voice/server/index.ts Adds getTSVoiceOrchestrator() accessor for session context tracking in TS even when Rust voice is enabled.
src/system/voice/server/VoiceOrchestrator.ts Records voice:ai:speech in session context for RAG/export; exposes activeSessionId + participants.
src/system/voice/server/AIAudioBridge.ts Passes displayName into Rust voiceSpeakInCall.
src/system/rag/sources/VoiceConversationSource.ts Annotates RAG voice context with sentiment-derived emotion/gesture labels and a mood summary.
src/system/rag/shared/TextSentiment.ts New TS mirror of Rust sentiment extraction for deterministic RAG/export annotation.
src/shared/version.ts Updates app VERSION constant (but currently mismatched with package version in this PR).
src/shared/generated-command-constants.ts Adds generated constants for collaboration/live/send and collaboration/live/export.
src/server/generated.ts Registers new server commands in the generated registry.
src/package.json Version bump.
src/package-lock.json Version bump.
src/generated-command-schemas.json Regenerates command schema metadata (timestamp + new commands).
src/commands/collaboration/live/send/shared/LiveSendTypes.ts New shared types/executor for collaboration/live/send.
src/commands/collaboration/live/send/server/LiveSendServerCommand.ts New server implementation for injecting a text message into the active live session.
src/commands/collaboration/live/join/server/LiveJoinServerCommand.ts Registers session with TS orchestrator; syncs “late joiners” into existing calls.
src/commands/collaboration/live/export/shared/LiveExportTypes.ts New shared types/executor for collaboration/live/export.
src/commands/collaboration/live/export/server/LiveExportServerCommand.ts New server implementation exporting recent utterances to markdown (optional sentiment annotations).
Files not reviewed (1)
  • src/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +16
import { Commands } from '@system/core/shared/Commands';
import type { DataListResult } from '@commands/data/list/shared/DataListTypes';
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LiveSendServerCommand has unused imports (Commands, DataListResult) which will trigger lint/tsc failures in setups with unused-import checks. Please remove them (or use them) to keep the file warning-free.

Suggested change
import { Commands } from '@system/core/shared/Commands';
import type { DataListResult } from '@commands/data/list/shared/DataListTypes';

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +75
const responderCount = transcriptionResult.success
? parseInt(transcriptionResult.message.match(/(\d+) AI/)?.[1] ?? '0', 10)
: 0;

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responderCount is derived by regex-parsing transcriptionResult.message (e.g. /(�\d+) AI/). This is brittle because it couples LiveSend to the exact wording of another command’s message string. Prefer returning a structured count from collaboration/live/transcription (or computing responders directly) so LiveSend doesn’t break if the message text changes.

Suggested change
const responderCount = transcriptionResult.success
? parseInt(transcriptionResult.message.match(/(\d+) AI/)?.[1] ?? '0', 10)
: 0;
let responderCount = 0;
if (transcriptionResult.success) {
const structuredResponderCount = (transcriptionResult as any).responderCount;
if (typeof structuredResponderCount === 'number' && Number.isFinite(structuredResponderCount) && structuredResponderCount >= 0) {
responderCount = structuredResponderCount;
} else {
const match = transcriptionResult.message.match(/(\d+)\s+AI/);
if (match) {
responderCount = parseInt(match[1], 10);
}
}
}

Copilot uses AI. Check for mistakes.
*/

export const VERSION = '1.0.8444';
export const VERSION = '1.0.8469';
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERSION here (1.0.8469) is out of sync with src/package.json / src/package-lock.json (1.0.8470 in this PR). This mismatch can cause confusing runtime/version reporting. Please update src/shared/version.ts to match the package version (or ensure your version-bump script updates all three consistently).

Suggested change
export const VERSION = '1.0.8469';
export const VERSION = '1.0.8470';

Copilot uses AI. Check for mistakes.
Comment on lines +2194 to +2214
if state.current == Emotion::Neutral || effective_weight < 0.001 {
continue;
}

// Apply the blend shape weight for the current emotion
if let Ok(mut weights) = morph_weights.get_mut(layout.mesh_entity) {
let w = weights.weights_mut();
let idx = match state.current {
Emotion::Happy => layout.happy_index,
Emotion::Sad => layout.sad_index,
Emotion::Angry => layout.angry_index,
Emotion::Surprised => layout.surprised_index,
Emotion::Relaxed => layout.relaxed_index,
Emotion::Neutral => None,
};
if let Some(i) = idx {
if i < w.len() {
w[i] = effective_weight;
}
}
}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In animate_expression, expression morph weights are only ever written for the currently active emotion, and the function continues for Neutral / near-zero effective weight. This can leave stale weights from a previous emotion (or from a previous frame) on other expression indices, so expressions may “stick” or blend unintentionally. Consider explicitly zeroing all emotion-related morph targets each frame (happy/sad/angry/surprised/relaxed) before setting the active one, and ensure neutral/zero-weight paths clear any previously-set weights.

Copilot uses AI. Check for mistakes.
Comment on lines +2615 to +2616
finished.push(*slot);
continue;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

animate_body_gestures removes a gesture as soon as elapsed >= total_end and continues before applying a final pose at weight=0. Because no other system resets arm/head bones, this can leave bones slightly offset from their rest transforms after the gesture ends (depending on frame step). Consider clamping elapsed to total_end and applying one last update (weight=0), or explicitly restoring the affected bones to their rest_rotation/rest_translation when the gesture is finished.

Suggested change
finished.push(*slot);
continue;
// Clamp to the exact end of the gesture and mark as finished,
// but still run one last update with weight = 0.0.
anim.elapsed = total_end;
finished.push(*slot);

Copilot uses AI. Check for mistakes.
Comment on lines +619 to +625
/// Max eye bone Y-rotation (radians) for looking left/right (inward)
horizontal_inner_deg: f32,
/// Max eye bone Y-rotation (radians) for looking left/right (outward)
horizontal_outer_deg: f32,
/// Max eye bone X-rotation (radians) for looking up
vertical_up_deg: f32,
/// Max eye bone X-rotation (radians) for looking down
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VrmLookAtConfig fields are named *_deg, and defaults are specified in degrees, but the doc comments say “(radians)”. This is misleading and makes it easy to misuse these values. Update the comments to consistently describe degrees (or rename fields if they are meant to be radians).

Suggested change
/// Max eye bone Y-rotation (radians) for looking left/right (inward)
horizontal_inner_deg: f32,
/// Max eye bone Y-rotation (radians) for looking left/right (outward)
horizontal_outer_deg: f32,
/// Max eye bone X-rotation (radians) for looking up
vertical_up_deg: f32,
/// Max eye bone X-rotation (radians) for looking down
/// Max eye bone Y-rotation in degrees for looking left/right (inward)
horizontal_inner_deg: f32,
/// Max eye bone Y-rotation in degrees for looking left/right (outward)
horizontal_outer_deg: f32,
/// Max eye bone X-rotation in degrees for looking up
vertical_up_deg: f32,
/// Max eye bone X-rotation in degrees for looking down

Copilot uses AI. Check for mistakes.
Comment on lines +1619 to 1622
// Swap camera to the new render target
commands.entity(state.camera_entity).insert(
RenderTarget::Image(new_rt_handle.clone().into()),
);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resize swaps the camera to a new render target and then respawns the readback entity via spawn_readback_entity(...), which immediately inserts the one-shot Readback component. This bypasses the new gating (only start readback once slot is active + model_loaded + camera active) and can reintroduce wasted readback / uninitialized-texture artifacts if a resize happens before the model is ready. Consider respawning with the non-starting variant and letting ensure_continuous_readback enable readback when the slot is ready (or compute start_active from the slot state/camera activity).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants