Installation โข Features โข Usage โข Technologies โข Contributing โข License
VaibVoice is an advanced AI-powered voice transcription application that converts your speech into intelligently formatted text. Designed to boost productivity, VaibVoice eliminates the need for manual typing while automatically formatting your content based on context.
- No More Typing: Speak naturally and let VaibVoice handle the text conversion
- Smart Formatting: Automatically formats emails, messages, and prompts appropriately
- Time Saving: Reduce the time spent on writing and formatting content
- Accessibility: Makes content creation accessible for everyone, including those with typing difficulties
- Professionals who need to create content quickly and efficiently
- Individuals with typing difficulties or RSI (Repetitive Strain Injury)
- Anyone looking to boost their writing productivity
- Content creators, programmers, and business professionals
- Powered by OpenAI's GPT-4o models for highly accurate transcription
- Support for multiple languages
- Customizable activation key for recording
- Automatically detects content type (email, message, prompt)
- Formats text appropriately based on context
- Removes formatting instructions from the final output
- Clean, modern dashboard with usage statistics
- Interactive playground for testing transcription
- Comprehensive history management
- Customizable settings
- Language selection for transcription
- Configurable recording hotkey
- Customizable notification sounds
- Selection of AI models for transcription and formatting
- Python 3.8+
- FastAPI for RESTful API
- SQLite for data storage
- OpenAI API for transcription and formatting
- React with TypeScript
- Tailwind CSS for styling
- Vite as build tool
- shadcn/ui components
- Python 3.8 or higher
- Node.js and npm
- OpenAI API key
-
Clone the repository
git clone https://github.com/yourusername/vaibvoice.git cd vaibvoice -
Install Python dependencies
pip install -r requirements.txt
-
Run the application
python run.py
The script will automatically:
- Install any missing Python dependencies
- Build the frontend if it hasn't been built already (requires Node.js)
-
Configure your OpenAI API key
- Navigate to Settings in the application
- Enter your OpenAI API key
- Save your settings
- Set your OpenAI API key in the Settings page
- Select your preferred language for transcription
- Configure your recording hotkey (default is Alt)
- Place your cursor where you want the transcription to appear
- Press and hold your configured hotkey
- Speak clearly into your microphone
- Release the key when finished speaking
- Wait for processing to complete
- Your formatted text will appear at the cursor position
For best results, start your dictation with instructions like:
- "This is an email to my colleague, format it professionally..."
- "Format this as a casual message to my friend..."
- "This is a prompt for an AI system..."
The AI will understand your instructions, format accordingly, and remove the instructions from the final text.
vaibvoice/
โโโ gui/ # Frontend React application
โโโ vaibvoice/ # Backend Python application
โ โโโ api/ # FastAPI routes and endpoints
โ โโโ core/ # Core functionality (recording, transcription)
โ โโโ db/ # Database models and repositories
โ โโโ models/ # Data models
โโโ run.py # Main entry point
โโโ requirements.txt # Python dependencies
- User activates recording with the configured hotkey
- Audio is captured and sent to the OpenAI API
- Transcription is processed and formatted
- Formatted text is returned to the user interface
- Transcription is saved to the history database
We welcome contributions to VaibVoice! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Commit your changes:
git commit -m 'Add some amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Voice activation mode
- Export functionality for transcription history
- Additional language support
- Mobile application
- Cloud synchronization
This project is licensed under the MIT License - see the LICENSE file for details.
