GeminiStream is an advanced multi-functional tool leveraging the power of the Multimodal Live API and Websockets. Designed for seamless real-time streaming and interaction with generative models, it supports various data modalities such as text and audio. This project showcases the efficient use of Google AI's Gemini 2.0 capabilities, emphasizing dynamic responses and robust multimodal content generation.
- β‘ Real-Time Multimodal Streaming: Generate and process text and audio in real time.
- π Websocket Integration: Provides efficient, bidirectional communication using Python websockets.
- π Secure API Management: Incorporates Google API key handling for secure and reliable connections.
- π§ Audio Processing: Converts generated audio from base64 encoding and saves it as WAV files.
- π οΈ Highly Customizable Tools: Flexible structure allows for easy addition and modification of generative tools.
git clone https://github.com/KishoreMuruganantham/GeminiStream-Real-Time-Multimodal-API-Integrator.git
cd geministreampip install -r requirements.txtDependencies:
websocketsaltairgoogle-api-core
from google.colab import userdata
MAPS_API_KEY = userdata.get('MAPS_API_KEY')
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')python main.pyawait send(ws, "Generate a summary of AI advancements in 2025.")Generated audio files are automatically saved as WAV files in the audio_output/ directory.
GeminiStream/
βββ main.py # Main application logic
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ audio_output/ # Directory for generated audio files
- setup(): Initializes the WebSocket session for multimodal generation.
- send(): Facilitates prompt submissions and user interaction handling.
- handle_server_content(): Processes server responses, including text and audio outputs.
- wave_file(): Manages audio file creation and proper formatting.
We welcome contributions! Follow these steps to contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch). - Commit your changes (
git commit -m 'Add new feature'). - Push to the branch (
git push origin feature-branch). - Submit a Pull Request for review.
This project is licensed under the MIT License.
- Google AI for the Multimodal Live API.
- Altair for visualization support.
- Websockets for seamless real-time communication.
- Author: Kishore Muruganantham
- π§ Email: kishore.muruganantham@gmail.com
β If you enjoyed this project, give it a star β and share it with your peers! π