RAG VOICE ASSISTANT 🎙️

Transforming Document Interactions with Voice Intelligence

Built with cutting-edge AI technologies:

Overview

RAG Voice Assistant is an advanced AI-powered application that combines Retrieval-Augmented Generation (RAG) with voice interaction capabilities. The system allows users to upload PDF documents and interact with them through both text and voice interfaces, providing intelligent responses based on document content.

Key Capabilities

🔍 Document Intelligence: Upload and process PDF documents with advanced text chunking and embedding
🎙️ Voice Interaction: Real-time speech-to-text and text-to-speech capabilities
💬 Intelligent Chat: Context-aware responses using OpenAI's GPT models
🔊 Audio Processing: Support for multiple voice models and audio formats
⚡ Real-time Processing: Live audio transcription and instant responses

Features

🎯 Core Features

PDF Document Processing: Advanced text extraction and chunking using PyPDF
Vector Search: FAISS-powered similarity search for relevant document retrieval
Multi-modal Interaction: Text, voice, and audio file input support
Real-time Transcription: Live speech-to-text using OpenAI Whisper
Text-to-Speech: Multiple voice options with OpenAI TTS
Context-aware Responses: RAG-based intelligent document querying

🛠️ Technical Features

FastAPI Backend: High-performance async API with automatic documentation
Streamlit Frontend: Interactive web interface with multiple tabs
WebRTC Integration: Real-time audio streaming capabilities
Modular Architecture: Separate backend and frontend for scalability
Error Handling: Comprehensive logging and error management
File Management: Automatic cleanup and temporary file handling

Getting Started

Prerequisites

This project requires the following dependencies:

Programming Language: Python 3.8+
Package Manager: pip
API Keys: OpenAI API key (required)
Audio Support: System audio drivers for voice features

Installation

Build the RAG Voice Assistant from source and install dependencies:

Clone the repository:

git clone https://github.com/yourusername/rag-voice-assistant.git

Navigate to the project directory:
```
cd rag-voice-assistant
```
Install the dependencies:

Using pip:
```
pip install -r requirements.txt
```

Configuration

Set up environment variables: Create a .env file in the root directory:
```
OPENAI_API_KEY=your_openai_api_key_here
```

Verify installation:

python -c "import openai; print('OpenAI installed successfully')"

Usage

Running the Backend

Start the FastAPI server with hot reload:

uvicorn main:app --reload

The API will be available at:

API: http://localhost:8000
Documentation: http://localhost:8000/docs
Alternative Docs: http://localhost:8000/redoc

Running the Frontend

Launch the Streamlit interface:

streamlit run frontend.py

The web interface will open at: http://localhost:8501

Using the Application

1. Document Upload

Navigate to the sidebar "Gerenciamento de Documentos"
Upload one or more PDF files
Click "Processar" to index the documents

2. Chat Interface

Use the "Chat" tab for text-based questions
Ask questions about your uploaded documents
Receive both text and audio responses

3. Voice Input

Switch to "Entrada por Voz" tab
Grant microphone permissions
Speak your questions naturally
View real-time transcription

4. Audio Features

Text-to-Speech: Convert any text to audio with voice selection
Audio-to-Text: Upload MP3 files for transcription
Voice Models: Choose from 6 different voice options

API Documentation

Core Endpoints

Document Management

POST /upload
Content-Type: multipart/form-data

Upload and process PDF documents for indexing.

Query Processing

POST /query
Content-Type: application/json

{
  "question": "Your question about the documents"
}

Audio Processing

POST /text-to-audio
Content-Type: multipart/form-data

Convert text to speech with voice selection.

POST /audio-to-text
Content-Type: multipart/form-data

Transcribe audio files to text using Whisper.

Response Formats

Query Response:

{
  "response": "AI-generated answer based on document content"
}

Error Response:

{
  "detail": "Error description"
}

Architecture

System Design

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Streamlit     │    │    FastAPI      │    │    OpenAI       │
│   Frontend      │◄──►│    Backend      │◄──►│    Services     │
│                 │    │                 │    │                 │
│ • Chat UI       │    │ • RAG Pipeline  │    │ • GPT Models    │
│ • Voice Input   │    │ • Audio Proc.   │    │ • Whisper       │
│ • File Upload   │    │ • Vector Store  │    │ • TTS           │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │    FAISS        │
                       │  Vector Store   │
                       │                 │
                       │ • Embeddings    │
                       │ • Similarity    │
                       │ • Search        │
                       └─────────────────┘

Data Flow

Document Processing: PDFs → Text Chunks → Embeddings → Vector Store
Query Processing: User Input → Similarity Search → Context Retrieval → LLM → Response
Audio Processing: Voice Input → Whisper → Text → Query Pipeline → TTS → Audio Output

Key Components

Document Loader: PyPDFLoader for PDF text extraction
Text Splitter: RecursiveCharacterTextSplitter for intelligent chunking
Embeddings: OpenAI embeddings for semantic search
Vector Store: FAISS for efficient similarity search
LLM: OpenAI GPT-3.5-turbo for response generation
Audio Processing: OpenAI Whisper (STT) and TTS models

Contributing

We welcome contributions to improve the RAG Voice Assistant! Here's how you can help:

Development Setup

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Make your changes and test thoroughly
Commit your changes: git commit -m "Add your feature"
Push to the branch: git push origin feature/your-feature
Open a Pull Request

Areas for Contribution

🌍 Internationalization: Add support for more languages
🎨 UI/UX: Improve the frontend interface
🔧 Performance: Optimize vector search and processing
📱 Mobile: Add mobile-responsive design
🧪 Testing: Add comprehensive test coverage
📚 Documentation: Improve docs and examples

Support

If you encounter any issues or have questions:

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ❤️ by [Hugo Parreão]

⭐ Star this project • 🍴 Fork it • 📢 Report Issues

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
README.md		README.md
frontend.py		frontend.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG VOICE ASSISTANT 🎙️

Table of Contents

Overview

Key Capabilities

Features

🎯 Core Features

🛠️ Technical Features

Getting Started

Prerequisites

Installation

Configuration

Usage

Running the Backend

Running the Frontend

Using the Application

1. Document Upload

2. Chat Interface

3. Voice Input

4. Audio Features

API Documentation

Core Endpoints

Document Management

Query Processing

Audio Processing

Response Formats

Architecture

System Design

Data Flow

Key Components

Contributing

Development Setup

Areas for Contribution

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages