Skip to content

GeminiStream πŸš€ is a real-time multimodal API integrator that leverages Google AI's Gemini 2.0 🌐 and Websockets ⚑. It supports seamless streaming of text πŸ“ and audio 🎧, secure API handling πŸ”’, and customizable tools πŸ› οΈ for dynamic generative content.

License

Notifications You must be signed in to change notification settings

KishoreMuruganantham/GeminiStream-Real-Time-Multimodal-API-Integrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌟 GeminiStream: Real-Time Multimodal API Integrator


πŸš€ Project Overview

GeminiStream is an advanced multi-functional tool leveraging the power of the Multimodal Live API and Websockets. Designed for seamless real-time streaming and interaction with generative models, it supports various data modalities such as text and audio. This project showcases the efficient use of Google AI's Gemini 2.0 capabilities, emphasizing dynamic responses and robust multimodal content generation.


πŸ’‘ Key Features

  • ⚑ Real-Time Multimodal Streaming: Generate and process text and audio in real time.
  • 🌐 Websocket Integration: Provides efficient, bidirectional communication using Python websockets.
  • πŸ”’ Secure API Management: Incorporates Google API key handling for secure and reliable connections.
  • 🎧 Audio Processing: Converts generated audio from base64 encoding and saves it as WAV files.
  • πŸ› οΈ Highly Customizable Tools: Flexible structure allows for easy addition and modification of generative tools.

πŸ› οΈ Installation Guide

1️⃣ Clone the Repository

git clone https://github.com/KishoreMuruganantham/GeminiStream-Real-Time-Multimodal-API-Integrator.git
cd geministream

2️⃣ Install Required Dependencies

pip install -r requirements.txt

Dependencies:

  • websockets
  • altair
  • google-api-core

3️⃣ Configure API Keys Securely

from google.colab import userdata
MAPS_API_KEY = userdata.get('MAPS_API_KEY')
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

πŸ”„ Usage Instructions

⚑ Run the Application

python main.py

πŸ’¬ Send Prompts for Generation

await send(ws, "Generate a summary of AI advancements in 2025.")

🎧 Audio Output Handling

Generated audio files are automatically saved as WAV files in the audio_output/ directory.


πŸ’» Project Structure

GeminiStream/
β”œβ”€β”€ main.py               # Main application logic
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ README.md             # Project documentation
└── audio_output/         # Directory for generated audio files

✨ Core Components

  • setup(): Initializes the WebSocket session for multimodal generation.
  • send(): Facilitates prompt submissions and user interaction handling.
  • handle_server_content(): Processes server responses, including text and audio outputs.
  • wave_file(): Manages audio file creation and proper formatting.

🌟 Contributing Guidelines

We welcome contributions! Follow these steps to contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Commit your changes (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Submit a Pull Request for review.

πŸ“œ License

This project is licensed under the MIT License.


🀝 Acknowledgements


πŸ“¬ Contact Information


⭐ If you enjoyed this project, give it a star ⭐ and share it with your peers! 🌟

About

GeminiStream πŸš€ is a real-time multimodal API integrator that leverages Google AI's Gemini 2.0 🌐 and Websockets ⚑. It supports seamless streaming of text πŸ“ and audio 🎧, secure API handling πŸ”’, and customizable tools πŸ› οΈ for dynamic generative content.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published