This example shows how to build a voice-driven interactive storytelling experience. It periodically prompts the user for input for a 'choose your own adventure' style experience.
We use Gemini 2.0 for creating the story and image prompts, and we add visual elements to the story by generating images using Google's Imagen.
Deepgram - Speech-to-Text
Transcribes inbound participant voice media to text.
Google Gemini 2.0 - LLM
Our creative writer LLM. You can see the context used to prompt it here
ElevenLabs - Text-to-Speech
Converts and streams the LLM response from text to audio
Google Imagen - Image Generation
Adds pictures to our story. Prompting is quite key for style consistency, so we task the LLM to turn each story page into a short image prompt.
-
Navigate to the client directory:
cd client -
Install dependencies:
npm install
-
Build the client:
npm run build
-
Navigate to the server directory
cd ../server -
Install dependencies
uv sync
-
Create environment file and set variables
cp env.example .env
You'll need API keys for:
- DAILY_API_KEY
- ELEVENLABS_API_KEY
- ELEVENLABS_VOICE_ID
- GOOGLE_API_KEY
-
(Optional) Deployment:
When deploying to production, to ensure only this app can spawn new bot processes, set your
ENVtoproduction
-
Navigate back to the demo's root directory:
cd .. -
Run the application:
uv run server/bot_runner.py --host localhost
You can run with a custom domain or port using:
uv run server/bot_runner.py --host somehost --p someport -
➡️ Open the host URL in your browser: http://localhost:7860
- Wait for track_started event to avoid rushed intro
- Show 5 minute timer on the UI
