A sophisticated AI companion powered by Deepgram's streaming text-to-speech and OpenAI's conversational AI.
- π Interactive Intro: Engaging animated welcome sequence with Lottie animations
- π€ Conversational AI: Contextual responses with authentic mAsK personality
- π Streaming TTS: Real-time text-to-speech using Deepgram's WebSocket API
- π¬ Smart Chat: Context-aware conversations with memory
- π Authentic Personality: MohammedAnas Shakil Kazi (mAsK) - A real persona, not an AI assistant
- ποΈ Voice Interaction: Record and receive voice responses with seamless audio processing
- οΏ½ Fallback System: Robust error handling with automatic fallback to REST API
pip install -r requirements.txtCreate a .env file in the project directory:
OPENAI_API_KEY=your_openai_api_key_here
DEEPGRAM_API_KEY=your_deepgram_api_key_herestreamlit run app.pyThe main app now features Deepgram streaming TTS. For the ElevenLabs version (if you prefer it):
streamlit run app_elevenlabs.pyVoicebot/
βββ app.py # Main application with streaming Deepgram TTS
βββ app_elevenlabs.py # Alternative version with ElevenLabs TTS
βββ requirements.txt # Dependencies
βββ .env # API keys (create this)
βββ assets/ # Static assets
β βββ loading_animation.json # Lottie animation
βββ README.md # This file
- WebSocket Streaming: Real-time text-to-speech using Deepgram's WebSocket API
- Fallback Mechanism: Automatic fallback to REST API if streaming fails
- High Quality Audio: Aura-2 model for natural voice synthesis
- Efficient Processing: Optimized for low-latency audio delivery
- Welcome Sequence: Engaging introduction with Lottie animations
- Voice Input: Record voice messages for conversation using Whisper
- Streaming Audio: Real-time audio generation and playback
- Interactive UI: Dynamic elements with modern chat interface
- Authentic Personality: mAsK personality with genuine human-like responses
- Context Memory: Maintains conversation state across interactions
- Dual Interface: Both text and voice chat modes
- Error Handling: Robust error management with user-friendly messages
- Persistent History: Conversations saved during session
- Streamlit Chat UI: Native chat interface
- Real-time Responses: Instant AI responses
- Theme Integration: Chat UI matches selected theme
MohammedAnas Shakil Kazi is an INFP personality who embodies:
- Deep introspection and empathy
- Authentic, vulnerable conversations
- Creative and poetic expression
- Gentle humor with slight awkwardness
- Meaningful connections over small talk
Make sure your .env file contains:
OPENAI_API_KEY=your_openai_api_key_here
DEEPGRAM_API_KEY=your_deepgram_api_key_here- Streaming TTS: Enabled by default with WebSocket API
- Fallback System: Automatic REST API fallback on streaming failure
- Audio Format: Linear16 encoding, 24kHz sample rate
- Voice Model: Aura-2-Arcas-EN for natural voice synthesis
- Auto-play: Enabled for immediate audio response
- Dual Modes: Text chat and voice chat tabs
- Session Persistence: Chat history maintained during session
- Real-time Updates: Instant message display and audio generation
- Error Handling: User-friendly error messages and recovery
- Streaming TTS: Deepgram WebSocket API for real-time audio generation
- OpenAI Integration: GPT-4 models for conversations with mAsK personality
- Speech Recognition: Whisper API for voice-to-text conversion
- Fallback System: Automatic REST API fallback if streaming fails
- Session Management: Persistent chat history and state management
- Error Handling: Comprehensive error handling with user feedback
- ElevenLabs TTS: High-quality voice synthesis with ElevenLabs API
- Voice Cloning: Custom voice models for personalized responses
- Audio Processing: Optimized audio generation and playback
- Streamlit Interface: Modern web-based chat interface
- Implement additional Deepgram voice models
- Add voice speed and pitch controls
- Optimize WebSocket connection handling
- Implement audio caching for better performance
- Add real-time voice activity detection
- Implement voice interruption handling
- Enhance audio quality processing
- Add support for multiple languages
- Add conversation export/import
- Implement user preferences storage
- Add more personality variations
- Enhance error recovery mechanisms
- Python 3.8+
- Streamlit 1.28+
- OpenAI API key
- Deepgram API key
- Internet connection
- Microphone access (for voice features)
- Modern web browser with audio support
- WebSocket connection failed: Check Deepgram API key and internet connection
- No audio output: Verify browser audio permissions and settings
- Fallback to REST API: Normal behavior when streaming fails, check console for details
- Audio quality issues: Ensure stable internet connection for streaming
- No intro audio: Verify assets/intro.mp3 file exists and is accessible
- Recording issues: Check microphone permissions in browser
- Playback problems: Verify browser audio settings and autoplay permissions
- Animation not loading: Check assets/loading_animation.json file
- OpenAI errors: Verify API key in
.envfile and check usage limits - Deepgram errors: Verify API key and check account balance
- Rate limiting: Wait a moment and try again, or upgrade API plan
- Model not found: Update to latest model versions in code
- deepgram-sdk problems: Try
pip install --upgrade deepgram-sdk - Audio dependencies: Install platform-specific audio libraries
- Permission errors: Run with administrator privileges
- Module not found: Ensure all requirements are installed with
pip install -r requirements.txt
Feel free to enhance mAsK with:
- Additional Deepgram voice models
- Real-time conversation features
- Advanced streaming optimizations
- Mobile app version
- Voice activity detection
- Multiple language support
This project is open source. Feel free to use and modify for personal or educational purposes.
"In a world of artificial intelligence, let's not forget to be authentically human." - mAsK πβ¨