Skip to content

Latest commit

 

History

History
110 lines (86 loc) · 4.67 KB

File metadata and controls

110 lines (86 loc) · 4.67 KB

Cameron Voice Agent Integration Guide

This document is the canonical reference for integrating Cameron's ElevenLabs voice agent React module into the main cast.dread.technology podcast player app. It is intended for engineers new to the codebase, with detailed walkthroughs, context, and a step-by-step plan.


1. Overview

  • Goal: Seamlessly add a browser-based, hyper-realistic voice chat agent ("Elon") to the podcast player. The agent should use the last N minutes of transcript context (relative to the playhead) as its system prompt.
  • User Experience:
    • User can listen to the podcast and view a synced transcript.
    • At any point, user can click a button to "Call Elon" from the current playhead.
    • The last N minutes of transcript are sent as context to the ElevenLabs agent.
    • Audio pauses, and a voice chat UI appears for the call.

2. Codebase Walkthrough

A. Main App (frontend/)

  • src/pages/App.jsx: Main app logic. Loads config, transcript, manages playhead, renders Player, Transcript, and SidebarDebug.
  • src/components/Player.jsx: Audio player. Reports playhead updates.
  • src/components/Transcript.jsx: Transcript viewer. Highlights and scrolls to the current line.
  • src/components/SidebarDebug.jsx: Shows the last N minutes of transcript context (adjustable, for debugging and context selection).
  • Transcript Data: Loaded as an array of objects. Each has timestamp, tts (text), speaker, and time fields.

B. Cameron's Voice Agent (cameron_magic_component_example/)

  • src/voiceAgent.js:
    • Exports useVoiceAgent(systemPrompt, prefilledMessages) hook and a VoiceAgent component.
    • Handles connection to ElevenLabs agent, microphone access, error handling, and conversation events.
    • Accepts a systemPrompt (string) and prefilledMessages (JSON array, optional).
  • src/App.js: Demo UI for the voice agent. Lets user set prompt/messages, start/end conversation, and see connection state/errors.
  • Integration Point: The main thing to provide is a string system prompt (the last N minutes of transcript, with speaker labels, no timestamps).

3. Integration Plan (Step-by-Step)

  1. Context Extraction

    • Use the logic from SidebarDebug to extract the last N minutes of transcript as a string.
    • Format: Each line as Speaker: transcript text. (No timestamps.)
    • Example:
      Joe: I think it's time...
      Elon: Absolutely.
      Unknown: What do you mean?
      
  2. UI Integration

    • Add a prominent button to the main app: "Call Elon from here" (visible above the transcript or in the sidebar).
    • When clicked:
      • Pause audio playback.
      • Lock in the current playhead and extract the last N minutes of context.
      • Render the voice agent UI (from Cameron's module) in a modal or dedicated panel.
  3. Voice Agent Embedding

    • Import useVoiceAgent and/or VoiceAgent from Cameron's code into the main app.
    • Pass the extracted context as the systemPrompt prop.
    • Handle errors and connection state in the main UI (surface errors clearly to the user).
  4. State Management

    • While in a call, disable audio controls and transcript seeking.
    • When the call ends, allow resuming playback from the same position.
  5. Testing & Debugging

    • Use the SidebarDebug component to verify the context string being sent.
    • Add logging and error boundaries as needed for robust UX.

4. Code Snippet: Context Extraction

// In App.jsx or a utils file
function getLastNMinutesContext(transcript, currentTime, minutes) {
  const secondsBack = minutes * 60;
  const lowerBound = Math.max(0, currentTime - secondsBack);
  return transcript
    .filter(item => {
      const s = timestampToSeconds(item.timestamp);
      return s >= lowerBound && s <= currentTime;
    })
    .map(item => `${item.speaker || 'Unknown'}: ${item.tts}`)
    .join('\n');
}

5. FAQ & Gotchas

  • Transcript field for text? Use tts.
  • Speaker label missing? Fallback to "Unknown".
  • Timestamps in context? Omit them for the system prompt.
  • Agent API keys? Use public agent for now; see Cameron's code for private agent setup.

6. References


7. Next Steps

  • Implement the integration as above.
  • Test with various playhead positions and context window sizes.
  • Polish the UI/UX for seamless transitions between playback and voice calls.

For questions, contact Aman or check the code comments for further guidance.