- Caution: This is my mini_project here
Welcome to the Multi-lingual AI Assistant—the future of voice-driven interaction powered by Gemini Pro and gTTS! This AI assistant brings the power of Google’s cutting-edge models to your fingertips, enabling seamless, real-time voice interactions across multiple languages. Speak your mind, and let the AI do the rest!
Whether you want to ask a question, get a recommendation, or just chat, this assistant is ready to assist you in multiple languages. It takes voice input, processes it using Gemini Pro, and responds with text-to-speech using gTTS. Plus, you can download the speech output for offline access and share it anytime!
This isn't just a simple assistant—it's an experience!
- Multi-Language Support: Communicate in multiple languages with Gemini Pro’s robust capabilities—whether you're in English, Spanish, French, or many others! The assistant speaks your language.
- Voice Input: No typing needed! Use the microphone to speak to your assistant, and it will convert your speech into text using Speech Recognition.
- Text-to-Speech with gTTS: The assistant converts its generated responses back into speech using the Google Text-to-Speech (gTTS) API. Hear the assistant’s voice in your preferred language.
- Downloadable Speech Output: After interacting with the assistant, get your generated speech as an audio file for offline use!
- Streamlit UI: A stunning, easy-to-use web interface built with Streamlit to bring everything together in a beautiful package. Interact with the assistant effortlessly.
Let's get your environment set up and ready to go! Open your terminal and run:
conda create --name multilingual-assistant python=3.9Activate the env:
conda activate multilingual-assistantNow, install all the required dependencies using the following command:
pip install -r requirements.txtMake sure you’ve got everything you need to make the magic happen!
Dependencies:
- gTTS (Google Text-to-Speech): Converts the assistant’s responses into speech.
- Gemini Pro: The language model behind all the intelligence.
- Streamlit: For building the stunning web interface.
- Speech Recognition: To convert your voice into text.
To interact with Gemini Pro, you'll need to set up API access. Head to Google Cloud, create a project, and enable Gemini Pro. Store your API key securely and configure it in your environment.
Now, it's time to see the magic in action. Run the following command:
streamlit run app.pyThis will start the Streamlit app and open the web interface in your browser.
- Record Your Voice: Click the Record button to start speaking.
- AI Processing: The assistant will listen to your speech, convert it to text, and send it to Gemini Pro for processing.
- Listen to the Response: The assistant will convert the AI-generated text back into speech using gTTS and play it back to you.
- Download the Speech: After hearing the assistant’s response, click the download button to save the speech for offline use.
Here’s a look at the project structure:
Multi-lingual-AI-Assistant-with-gTTS-and-Gemini-Pro/
│
├── app.py # Streamlit UI for interaction
├── requirements.txt # All the necessary dependencies
├── src
|-----helper.py
└── README.md # Project documentation (You’re looking at it right now!)
- Gemini Pro: Google’s state-of-the-art language model for intelligent AI responses.
- gTTS (Google Text-to-Speech): Converting text to natural-sounding speech using Google’s powerful TTS engine.
- Streamlit: A super-fast, easy-to-use library for creating web apps with a focus on machine learning.
- Speech Recognition: Capturing voice input and converting it to text.
- Python 3.9: The Python version keeping everything running smoothly.
This project is licensed under the MIT License. Check the LICENSE file for more details.