This application is a voice-enabled AI assistant that lets you talk to your documents.
Upload your PDFs or text files, and simply speak your question β the bot will transcribe your voice, retrieve the most relevant answer from your knowledge base using RAG (Retrieval-Augmented Generation), and reply back in a natural, human-like voice.
- π€ Voice Input: Speak instead of typing β powered by OpenAI Whisper.
- π Document Knowledge Base: Upload PDFs or TXT files to build a searchable knowledge base.
- π Retrieval-Augmented Generation: Finds the most relevant info from your documents before answering.
- π£οΈ Text-to-Speech: Natural-sounding audio replies using Kokoro TTS.
- β‘ Real-time Interaction: Smooth and quick responses in a friendly chat interface.
- Frontend/UI: Streamlit
- Speech-to-Text (ASR): Whisper
- Text-to-Speech (TTS): Kokoro
- Document Processing & RAG: LangChain + Vector Stores
- Backend Language: Python3.10
- Upload Documents β PDF or TXT files via the sidebar.
- Process Knowledge Base β Files are chunked, embedded, and stored in a vector database.
- Ask via Voice β Speak your query into the mic.
- RAG Retrieval β Finds and ranks relevant chunks from your uploaded content.
- Answer Generation β Summarizes and formats the best answer.
- Voice Response β Converts the answer into natural speech and plays it.
git clone https://github.com/yourusername/voice-agent.git
cd voice-agent
pip install -r requirements.txt
streamlit run main.py
- Make sure you have FFmpeg installed for Whisper.
- Supports multiple files and multiple queries in a session.
- Best used with clear audio for optimal transcription accuracy.