Skip to content

An end-to-end AI assistant using gTTS for multi-lingual text-to-speech and Gemini Pro API for smart responses. Experience seamless voice interaction in various languages with continuous updates and improvements!

License

Notifications You must be signed in to change notification settings

shaheennabi/Multi-lingual-AI-Assistant-with-gTTS-and-Gemini-Pro

Repository files navigation

Multi-lingual AI Assistant with gTTS and Gemini Pro

  • Caution: This is my mini_project here

Welcome to the Multi-lingual AI Assistant—the future of voice-driven interaction powered by Gemini Pro and gTTS! This AI assistant brings the power of Google’s cutting-edge models to your fingertips, enabling seamless, real-time voice interactions across multiple languages. Speak your mind, and let the AI do the rest!

Whether you want to ask a question, get a recommendation, or just chat, this assistant is ready to assist you in multiple languages. It takes voice input, processes it using Gemini Pro, and responds with text-to-speech using gTTS. Plus, you can download the speech output for offline access and share it anytime!

This isn't just a simple assistant—it's an experience!


Key Features

  • Multi-Language Support: Communicate in multiple languages with Gemini Pro’s robust capabilities—whether you're in English, Spanish, French, or many others! The assistant speaks your language.
  • Voice Input: No typing needed! Use the microphone to speak to your assistant, and it will convert your speech into text using Speech Recognition.
  • Text-to-Speech with gTTS: The assistant converts its generated responses back into speech using the Google Text-to-Speech (gTTS) API. Hear the assistant’s voice in your preferred language.
  • Downloadable Speech Output: After interacting with the assistant, get your generated speech as an audio file for offline use!
  • Streamlit UI: A stunning, easy-to-use web interface built with Streamlit to bring everything together in a beautiful package. Interact with the assistant effortlessly.

Installation Guide

Step 1: Create Your Conda Environment

Let's get your environment set up and ready to go! Open your terminal and run:

conda create --name multilingual-assistant python=3.9

Activate the env:

conda activate multilingual-assistant

Step 2: Install Dependencies

Now, install all the required dependencies using the following command:

pip install -r requirements.txt

Make sure you’ve got everything you need to make the magic happen!

Dependencies:

  • gTTS (Google Text-to-Speech): Converts the assistant’s responses into speech.
  • Gemini Pro: The language model behind all the intelligence.
  • Streamlit: For building the stunning web interface.
  • Speech Recognition: To convert your voice into text.

How to Use

Step 1: Set Up API Keys for Gemini Pro

To interact with Gemini Pro, you'll need to set up API access. Head to Google Cloud, create a project, and enable Gemini Pro. Store your API key securely and configure it in your environment.

Step 2: Launch the Streamlit Application

Now, it's time to see the magic in action. Run the following command:

streamlit run app.py

This will start the Streamlit app and open the web interface in your browser.

Step 3: Interact with the Assistant

  1. Record Your Voice: Click the Record button to start speaking.
  2. AI Processing: The assistant will listen to your speech, convert it to text, and send it to Gemini Pro for processing.
  3. Listen to the Response: The assistant will convert the AI-generated text back into speech using gTTS and play it back to you.
  4. Download the Speech: After hearing the assistant’s response, click the download button to save the speech for offline use.

Project Structure

Here’s a look at the project structure:

Multi-lingual-AI-Assistant-with-gTTS-and-Gemini-Pro/
│
├── app.py                # Streamlit UI for interaction
├── requirements.txt      # All the necessary dependencies
├── src      
     |-----helper.py
└── README.md             # Project documentation (You’re looking at it right now!)

Technologies Used

  • Gemini Pro: Google’s state-of-the-art language model for intelligent AI responses.
  • gTTS (Google Text-to-Speech): Converting text to natural-sounding speech using Google’s powerful TTS engine.
  • Streamlit: A super-fast, easy-to-use library for creating web apps with a focus on machine learning.
  • Speech Recognition: Capturing voice input and converting it to text.
  • Python 3.9: The Python version keeping everything running smoothly.

📜 License

This project is licensed under the MIT License. Check the LICENSE file for more details.


About

An end-to-end AI assistant using gTTS for multi-lingual text-to-speech and Gemini Pro API for smart responses. Experience seamless voice interaction in various languages with continuous updates and improvements!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages