Skip to content

Bevy avatar rendering + LiveKit video pipeline#274

Merged
joelteply merged 16 commits intomainfrom
feature/bevy-avatars
Feb 26, 2026
Merged

Bevy avatar rendering + LiveKit video pipeline#274
joelteply merged 16 commits intomainfrom
feature/bevy-avatars

Conversation

@joelteply
Copy link
Contributor

Summary

  • Headless Bevy 0.18 renderer for VRM avatar models (16 slots, 640×480 @24fps)
  • Gender-coherent avatar selection with deterministic identity traits from userId hash
  • TTS voice gender matching (male/female models map to appropriate TTS voices)
  • LiveKit agent video pipeline: camera capture → avatar render → RGBA→I420 → video publish
  • IPC connection pool for Rust↔TypeScript communication
  • Parallel startup pipeline, VAD fix, phased LiveKit agent connections
  • Live voice transcriptions persisted to chat_messages
  • 18 CC0 VRM models (7M/11F) from VRoid + 100Avatars collections
  • All Rust workspace deps upgraded to latest (zero warnings)
  • ORM DbHandle defaults removed, persona DB handle propagation fixed
  • LiveWidget decomposed into proper Lit sub-components

16 commits

Confirmed working at d101d32 — STT, subtitles, AI responses all function.

Test plan

  • Process stays alive (no SIGTRAP/SIGABRT)
  • STT works (subtitles appear)
  • AIs respond with avatar lip-sync
  • All avatar models load and render
  • Voice gender matching works correctly

…ideo pipeline

- ORMRustClient: Single socket → 4-connection pool with least-busy routing
  Eliminates IPC serialization bottleneck (all agents competed for 1 pipe)
- Bevy headless renderer: VRM avatar models rendered at 5fps into render targets
  Single shared directional light (fixes Bevy 10-light limit)
- LiveKit agents: Pre-created on room join with 500ms stagger (not on first speak)
- LiveKit agents publish video tracks (640x480) from Bevy frame readback
- LiveKit server: --node-ip 127.0.0.1 fixes ICE candidate negotiation on localhost
- LiveJoinServerCommand: Stale call detection on server restart
- LiveWidget: Video track attachment for remote participants
- data-clear: Clears calls collection to avoid stale LiveKit room references
Human transcriptions from live calls were ephemeral (events only).
Now saved as ChatMessageEntity with sourceModality:'voice' metadata.
Cached callSessionId→roomId lookup avoids repeated DB queries.
chat/export --room="general" now includes both text and voice messages.
Startup:
- parallel-start.sh: concurrent TS + Rust builds (61s vs 130s)
- system-stop.sh: comprehensive process cleanup (tmux, ports, sockets)
- SystemOrchestrator: ping-based bootstrap check prevents white screen
- start-workers.sh: --skip-build flag when caller already built

Voice/LiveKit fixes:
- WebRTC VAD: fix aggressiveness mapping (all levels were VERY_AGGRESSIVE)
- ProductionVAD: add debug logging for speech detection pipeline
- voice module: phased agent connection (STT first, 2s stagger between agents)

Avatar rendering:
- MAX_AVATAR_SLOTS 14->24, use constant instead of hardcoded slot check
- Dark fallback for avatar tiles, video container CSS with fade-in
- Remove `= 'default'` from all 15 ORM method signatures — handle is now
  REQUIRED, compiler catches every missing handle
- Fix 32 callsites across commands/daemons/system with explicit 'default'
  for shared-data operations
- Fix persona handle propagation: Hippocampus opens longterm.db and now
  exposes the handle via waitForDbInit(). LimbicSystem.propagateDbHandle()
  pushes it to PersonaMemory after init. Fixes 14 personas thrashing main
  DB causing 27s IPC timeouts.
ts-rs 9→12 (eliminates 68 serde parse warnings), candle 0.8→0.9,
rusqlite 0.32→0.38, tonic/prost 0.11/0.12→0.14, safetensors 0.4→0.7,
fastembed 4→5, ort rc.9→rc.11, thiserror 1→2, tokio-tungstenite 0.21→0.28,
half 2.3→2.7, hf-hub 0.4→0.5, tokenizers 0.20→0.22, ndarray 0.16→0.17
Split 1431-line monolith into orchestrator + 3 sub-components:
- LiveParticipantTile: owns video-container in its own shadow DOM
- LiveControls: SVG icons + media buttons, fires events up
- LiveCaptions: multi-speaker transcription with auto-fade

SCSS split into per-component files. LiveWidget retains session
lifecycle, AudioStreamClient, state persistence, layout decisions.
- LiveParticipantTile: render videoElement directly in Lit template
  instead of imperative querySelector + appendChild
- LiveWidget: bind all child state via Lit properties
- Use repeat() directive for keyed participant lists
- Use ref() directive for captions/controls refs
Generated .css, .styles.ts, .css.map files from compile-sass.ts
should not be checked in — they're regenerated by npm start.
- Render at 1280x720 (was 640x480) for crisp video conference tiles
- Disable WebRTC simulcast, set 2.5Mbps explicit bitrate to prevent
  adaptive compression blur on avatar video tracks
- Load 3D VRM model at connect time, not deferred to first speech
- Add deterministic_pick(id, options, salt) using FNV-1a hash for
  stable trait selection from any array given a unique ID
- Derive avatar gender from persona identity when voice isn't known,
  so model selection is immediate and consistent
- Remove transcription-to-chat persistence (transcriptions are live
  captions, not chat messages) and add browser-side dedup
- Add ResizeObserver on tiles + data channel for future dynamic resize
- Add ts-rs generated TileResolution/ResolutionTierWire types
… modular architecture

- Avatar selection enforces gender coherence: avatar gender always matches voice gender
- Single source of truth: gender_from_identity(user_id) seeds both avatar and TTS voice
- Gender is a SEED default — designed for future override via user.state preferences
- TTS gender_hint flows through entire chain: speak_in_call → tts_service → tts::synthesize
- resolve_voice_gendered() filters voice catalog by gender before deterministic hash
- FPS bumped 15→24, mouth weight windows 200ms→66ms, amplitude scaling 0.7→1.0
- Video encoding: 800kbps→1.2Mbps, 15fps→24fps for smoother lip sync
- VRM 1.0 models (169 joints) filtered from catalog — Bevy can't render >128 joints
- Modular avatar/ module: catalog, selection, gender, frame_analysis, backends, types
- Health check at frame 150/300 detects Empty/BrokenGeometry (log only, no fallback)
- 76 avatar tests + 6 TTS service tests passing
Reduces male model sharing from 4 duplicates to 1. 9 male agents now
have 7 unique models to draw from. Both models are VRM 0.x with full
mouth morph targets for lip sync.

- wv-sakurada2: Sakurada Fumiriya variant (107 joints, M00_ materials)
- wv-shilo: Hand-rigged male (85 joints, sideburn bones, 17 blend shapes)
Copilot AI review requested due to automatic review settings February 26, 2026 19:08
@joelteply joelteply merged commit 642fdfc into main Feb 26, 2026
1 check passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@joelteply joelteply deleted the feature/bevy-avatars branch February 26, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants