SpeakSnap is an AI-powered meeting companion that enhances real-time conversations by identifying and summarizing domain-specific terms as you speak β helping everyone stay on the same page.
Perfect for technical discussions, onboarding sessions, or interdisciplinary meetings, SpeakSnap provides live contextual explanations of complex terms right inside your video call.
- π£οΈ Real-time Audio Transcription using Azure Speech-to-Text
- π§ Contextual Term Detection with Google's Gemini API via LangChain
- π‘ Dynamic Popups in the frontend to display term summaries live during meetings
- π₯ Jitsi Meet Integration for live video/audio conferencing
- π¦ Modular Architecture split into Core (AI) and Suite (App)
| Component | Description |
|---|---|
| Core | Python module that uses Gemini + LangChain to process domain-specific terms |
| Suite | JavaScript backend and frontend with Azure STT, Jitsi, and term popup UI |
git clone https://github.com/your-org/speaksnap.git
cd speaksnapThe Core handles all the AI-based processes, such as interacting with the Gemini API to detect and summarize domain-specific terms during the meeting.
-
Navigate to the
core/directory:cd core -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
Create a .env file in the core/ directory with your Google Gemini API key:
GOOGLE_API_KEY=your_google_gemini_api_key_hereStart the core service, which will handle term detection and summarization:
python main.pyThe Suite is responsible for the frontend UI and the backend WebSocket server that connects to the core service.
-
Navigate to the
suite/backenddirectory:cd suite/backend -
Install dependencies:
npm install
If you face any issues, try:
npm install vite@4.0.0
Create a .env file in suite/backend/ with the following environment variables:
MONGO_URI=your_mongodb_connection_string
AZURE_SPEECH_KEY=your_azure_speech_key
AZURE_REGION=your_azure_regionStart the backend WebSocket server, which will handle real-time speech-to-text data and interact with the Core:
node server.js-
Navigate to the
suite/frontend/SpeakSuitdirectory:cd suite/frontend/SpeakSuit -
Install dependencies:
npm install
If any issues arise, try:
npm install vite@4.0.0
Start the frontend React app, which will display live term summaries in the meeting:
npm run devThe app will be available at http://localhost:5173.
Due to some technical difficulties, we have only hosted the API server at the following URL:
- API Endpoint: http://52.23.182.233:8080/api/summary/
The API accepts the following JSON request schema:
{
"text": "string",
"userid": "string",
"sessionid": "string"
}The response schema is as follows:
{
"title": "Summary",
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "An overall summary of the entire chat history until the most recent query, in as few lines as possible but make sure to the major components of old text as well"
},
"sentiment": {
"type": "string",
"enum": ["pos", "neu", "neg"],
"description": "Return the sentiment of the conversation as positive, neutral, or negative"
},
"name": {
"type": ["string", "null"],
"description": "The speaker's name, if available. Use null if the speaker is unidentified or not mentioned in the text."
},
"contextual_explanations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"term": {
"type": "string",
"description": "A term or phrase used in the conversation that might require explanationβthis includes pop culture references (e.g., TV shows, movies), scientific terms, financial or economic concepts, historical or political references, technical jargon, or any other potentially unclear or domain-specific expression."
},
"explanation": {
"type": "string",
"description": "A concise explanation of the term in the context it was used, aimed at someone who may not be familiar with it."
}
},
"required": ["term", "explanation"]
},
"description": "List of all terms or phrases in the conversation that could benefit from contextual explanation, regardless of their domain."
}
},
"required": ["summary", "sentiment"]
}- Start the Core service (
python core/main.py) - Start the Suite backend (
node suite/backend/server.js) - Start the Suite frontend (
npm run devinsidesuite/frontend/SpeakSuit) - Join a Jitsi meeting and speak β watch contextual definitions appear live!
- π§ speaksnap-core (Python) β Gemini + LangChain backend
- π» speaksnap-suite (JS) β Backend (Node.js) and Frontend (React)
MIT License β See individual folders for details.
- Siddharth Karmokar (Backend Api Server)
- Arnav Sharda (Frontend Developer)
- Rushikesh Iche (Backend Developer)
