Bevy avatar rendering + LiveKit video pipeline by joelteply · Pull Request #274 · CambrianTech/continuum

joelteply · 2026-02-26T19:08:41Z

Summary

Headless Bevy 0.18 renderer for VRM avatar models (16 slots, 640×480 @24fps)
Gender-coherent avatar selection with deterministic identity traits from userId hash
TTS voice gender matching (male/female models map to appropriate TTS voices)
LiveKit agent video pipeline: camera capture → avatar render → RGBA→I420 → video publish
IPC connection pool for Rust↔TypeScript communication
Parallel startup pipeline, VAD fix, phased LiveKit agent connections
Live voice transcriptions persisted to chat_messages
18 CC0 VRM models (7M/11F) from VRoid + 100Avatars collections
All Rust workspace deps upgraded to latest (zero warnings)
ORM DbHandle defaults removed, persona DB handle propagation fixed
LiveWidget decomposed into proper Lit sub-components

16 commits

Confirmed working at d101d32 — STT, subtitles, AI responses all function.

Test plan

Process stays alive (no SIGTRAP/SIGABRT)
STT works (subtitles appear)
AIs respond with avatar lip-sync
All avatar models load and render
Voice gender matching works correctly

…ideo pipeline - ORMRustClient: Single socket → 4-connection pool with least-busy routing Eliminates IPC serialization bottleneck (all agents competed for 1 pipe) - Bevy headless renderer: VRM avatar models rendered at 5fps into render targets Single shared directional light (fixes Bevy 10-light limit) - LiveKit agents: Pre-created on room join with 500ms stagger (not on first speak) - LiveKit agents publish video tracks (640x480) from Bevy frame readback - LiveKit server: --node-ip 127.0.0.1 fixes ICE candidate negotiation on localhost - LiveJoinServerCommand: Stale call detection on server restart - LiveWidget: Video track attachment for remote participants - data-clear: Clears calls collection to avoid stale LiveKit room references

Human transcriptions from live calls were ephemeral (events only). Now saved as ChatMessageEntity with sourceModality:'voice' metadata. Cached callSessionId→roomId lookup avoids repeated DB queries. chat/export --room="general" now includes both text and voice messages.

Startup: - parallel-start.sh: concurrent TS + Rust builds (61s vs 130s) - system-stop.sh: comprehensive process cleanup (tmux, ports, sockets) - SystemOrchestrator: ping-based bootstrap check prevents white screen - start-workers.sh: --skip-build flag when caller already built Voice/LiveKit fixes: - WebRTC VAD: fix aggressiveness mapping (all levels were VERY_AGGRESSIVE) - ProductionVAD: add debug logging for speech detection pipeline - voice module: phased agent connection (STT first, 2s stagger between agents) Avatar rendering: - MAX_AVATAR_SLOTS 14->24, use constant instead of hardcoded slot check - Dark fallback for avatar tiles, video container CSS with fade-in

- Remove `= 'default'` from all 15 ORM method signatures — handle is now REQUIRED, compiler catches every missing handle - Fix 32 callsites across commands/daemons/system with explicit 'default' for shared-data operations - Fix persona handle propagation: Hippocampus opens longterm.db and now exposes the handle via waitForDbInit(). LimbicSystem.propagateDbHandle() pushes it to PersonaMemory after init. Fixes 14 personas thrashing main DB causing 27s IPC timeouts.

ts-rs 9→12 (eliminates 68 serde parse warnings), candle 0.8→0.9, rusqlite 0.32→0.38, tonic/prost 0.11/0.12→0.14, safetensors 0.4→0.7, fastembed 4→5, ort rc.9→rc.11, thiserror 1→2, tokio-tungstenite 0.21→0.28, half 2.3→2.7, hf-hub 0.4→0.5, tokenizers 0.20→0.22, ndarray 0.16→0.17

Split 1431-line monolith into orchestrator + 3 sub-components: - LiveParticipantTile: owns video-container in its own shadow DOM - LiveControls: SVG icons + media buttons, fires events up - LiveCaptions: multi-speaker transcription with auto-fade SCSS split into per-component files. LiveWidget retains session lifecycle, AudioStreamClient, state persistence, layout decisions.

- LiveParticipantTile: render videoElement directly in Lit template instead of imperative querySelector + appendChild - LiveWidget: bind all child state via Lit properties - Use repeat() directive for keyed participant lists - Use ref() directive for captions/controls refs

Generated .css, .styles.ts, .css.map files from compile-sass.ts should not be checked in — they're regenerated by npm start.

- Render at 1280x720 (was 640x480) for crisp video conference tiles - Disable WebRTC simulcast, set 2.5Mbps explicit bitrate to prevent adaptive compression blur on avatar video tracks - Load 3D VRM model at connect time, not deferred to first speech - Add deterministic_pick(id, options, salt) using FNV-1a hash for stable trait selection from any array given a unique ID - Derive avatar gender from persona identity when voice isn't known, so model selection is immediate and consistent - Remove transcription-to-chat persistence (transcriptions are live captions, not chat messages) and add browser-side dedup - Add ResizeObserver on tiles + data channel for future dynamic resize - Add ts-rs generated TileResolution/ResolutionTierWire types

… modular architecture - Avatar selection enforces gender coherence: avatar gender always matches voice gender - Single source of truth: gender_from_identity(user_id) seeds both avatar and TTS voice - Gender is a SEED default — designed for future override via user.state preferences - TTS gender_hint flows through entire chain: speak_in_call → tts_service → tts::synthesize - resolve_voice_gendered() filters voice catalog by gender before deterministic hash - FPS bumped 15→24, mouth weight windows 200ms→66ms, amplitude scaling 0.7→1.0 - Video encoding: 800kbps→1.2Mbps, 15fps→24fps for smoother lip sync - VRM 1.0 models (169 joints) filtered from catalog — Bevy can't render >128 joints - Modular avatar/ module: catalog, selection, gender, frame_analysis, backends, types - Health check at frame 150/300 detects Empty/BrokenGeometry (log only, no fallback) - 76 avatar tests + 6 TTS service tests passing

Reduces male model sharing from 4 duplicates to 1. 9 male agents now have 7 unique models to draw from. Both models are VRM 0.x with full mouth morph targets for lip sync. - wv-sakurada2: Sakurada Fumiriya variant (107 joints, M00_ materials) - wv-shilo: Hand-rigged male (85 joints, sideburn bones, 17 blend shapes)

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

joelteply added 16 commits February 22, 2026 23:07

gestures work

46f8b25

fixed a lot of 3d stuff

3103d6d

minor tweaks

ef34330

Remove generated CSS/styles.ts from tracking, update gitignore

086656c

Generated .css, .styles.ts, .css.map files from compile-sass.ts should not be checked in — they're regenerated by npm start.

decent video conference

3931ff4

good looking models, filled out, one repeat

7b5f123

Copilot AI review requested due to automatic review settings February 26, 2026 19:08

joelteply merged commit 642fdfc into main Feb 26, 2026
1 check passed

Copilot AI reviewed Feb 26, 2026

View reviewed changes

joelteply deleted the feature/bevy-avatars branch February 26, 2026 19:09

github-actions bot added the size: XL label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bevy avatar rendering + LiveKit video pipeline#274

Bevy avatar rendering + LiveKit video pipeline#274
joelteply merged 16 commits intomainfrom
feature/bevy-avatars

joelteply commented Feb 26, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joelteply commented Feb 26, 2026

Summary

16 commits

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants