Sydney – Ultra Fast Version , Local multimodal voice AI for medical guidance. Offline, CPU‑friendly, powered by MedGemma-4B/Gemma3:1B, Whisper Tiny (Customizable); Glow‑TTS. RAG + memory = smart, fast insights. accurate, context-aware responses.
- 🧠 Multimodal: Supports both text and voice input/output in the same version.
- 📚 Offline-first: Entirely functional without internet once models and embeddings are downloaded.
- ⚡ CPU-optimized: Fast inference even on standard CPUs without compromising response quality.
- 🛠 Context-aware memory: Stores user queries and AI responses, allowing nuanced conversations.
- 🔍 Multiple RAG pipelines: Integrates local medical documents and Granite 47M R2 embeddings for factual, precise, and comprehensive answers.
- 💡 Problem-solving powerhouse: Capable of complex multi-topic medical reasoning, combining retrieval and generative capabilities.
- 🎯 Clean & structured outputs: Markdown removal, concise formatting, and speech-ready text.
- 🚨 Emergency detection: Flags urgent situations and provides immediate guidance to contact professionals.
Sydney is designed as a unified multimodal AI assistant:
┌─────────────────────────────────────────────────────────────────────┐
│ MEDICAL AI ASSISTANT │
│ System Architecture │
└─────────────────────────────────────────────────────────────────────┘
┌──────────────┐
│ User Input │ ← Text or Voice
│ Text / Voice │
└──────┬───────┘
↓
┌──────────────────────────────┐
│ Whisper Speech-to-Text │
│ (Voice input transcription) │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Memory Manager │
│ • Tracks last N turns │
│ • Maintains context │
│ • Preserves conversation │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ RAG Pipeline │
│ • Local medical documents │
│ • Granite 47M R2 embeddings │
│ • Semantic search & combine │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Critical Keywords Filter │
│ Checks for: "chest pain", │
│ "difficulty breathing", etc. │
└────┬─────────────────────┬───┘
│ │
[Emergency] [Normal Query]
│ │
↓ ↓
┌────────────────┐ [Continue to LLM]
│ EMERGENCY MODE │
│ • Output alert │
│ • Bypass LLM │
│ • Contact help │
└────────────────┘
┌──────────────────────────────┐
│ LLM Processing │
│ • Concise responses │
│ • Context-aware reasoning │
│ • Multi-topic handling │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Post-Processing & Cleanup │
│ • Markdown removal │
│ • Bullet formatting │
│ • Text normalization │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Glow-TTS Speech Synthesis │
│ • Natural voice output │
│ • Expressive intonation │
└──────────────┬───────────────┘
┌──────────────────────────────┐
│ Gradio UI │
│ 📝 Text Output │
│ 🔊 Voice Output │
└──────────────────────────────┘
| Feature | Description |
|---|---|
| 🎤 Multimodal Input | Supports both voice (Whisper STT) and text input |
| 🧠 Intelligent Memory | Maintains conversation context across multiple turns |
| 📚 RAG Pipeline | Retrieves relevant information from local medical documents |
| 🚨 Emergency Detection | Real-time keyword filtering for critical symptoms |
| 💬 Smart Processing | Context-aware LLM with multi-topic reasoning |
| 🗣️ Natural Voice | Glow-TTS for expressive speech synthesis |
| 🖥️ User-Friendly UI | Clean Gradio interface with dual output modes |
The system includes a Critical Keywords Filter that:
- Monitors for emergency terms (e.g., "chest pain", "difficulty breathing")
- Immediately triggers emergency protocols
- Bypasses normal processing to provide instant alerts
- Recommends contacting emergency services
- Speech-to-Text: Whisper
- Embeddings: Granite 47M R2
- Text-to-Speech: Glow-TTS
- Interface: Gradio
- Architecture: RAG (Retrieval-Augmented Generation)
graph TD
A[User Input] --> B{Input Type}
B -->|Voice| C[Whisper STT]
B -->|Text| D[Memory Manager]
C --> D
D --> E[RAG Pipeline]
E --> F[Keyword Filter]
F -->|Emergency| G[Alert & Bypass]
F -->|Normal| H[LLM Processing]
H --> I[Post-Processing]
I --> J[Glow-TTS]
J --> K[Gradio UI]
G --> K
- User speaks or types: "I have a mild headache"
- System processes: Whisper converts voice → Memory recalls context → RAG retrieves relevant info
- Safety check: Keywords analyzed (no emergency detected)
- LLM generates: Context-aware medical advice
- Output delivered: Text + voice response via Gradio UI
This is an AI assistant for informational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of qualified health providers with questions about medical conditions.
- Critical Keywords/Emergency Filter: Skips regular processing and contacts emergency services when critical medical terms (like “chest pain”) are detected.
- MedGemma-4B as default LLM, with mention of other MedGemma models as possible options.
- Configurable Whisper Models: Can use any Whisper model (“tiny” to “large”) and switch to GPU if available.
Sydney’s conversation memory is persistent, intelligent, and context-aware:
- Stores all queries and AI responses in
memory.json. - Memory is limited to a configurable recent number of turns (default: 10) for speed.
- Every new query is automatically enriched with recent context, ensuring:
- Coherent multi-turn conversations
- Avoiding repetition
- Tailored responses based on your history
Example Context-Aware Query:
Recent context:
User: I have diabetes and high blood pressure. Can I exercise daily?
Assistant: Light cardio 3-5 times/week, strength training, and regular monitoring.
Current query: What dietary changes should I implement alongside exercise?
Assistant: - Low glycemic index foods, high fiber intake ...
- Reduce sodium and processed foods ...
- Maintain protein balance for muscle health ...
Sydney now includes a Critical Keywords Filter for urgent health issues:
- Scans user input (text or speech) for life-threatening terms (e.g. “chest pain”, “shortness of breath”, “stroke”).
- If any critical medical keyword is detected:
- Bypasses normal reasoning, RAG, LLM, and TTS pipelines.
- Immediately triggers an emergency response, outputs clear instructions, and prompts to contact healthcare authorities or nearest emergency services.
- Customizable emergency keyword list and response logic in
is_medical_emergency()andget_emergency_response().
Sydney uses multiple powerful RAG systems to provide factually-grounded answers:
-
Local Medical Knowledge Base RAG
- A curated offline document set covering diseases, symptoms, nutrition, and lifestyle.
-
Granite 47M R2 Embeddings RAG
- Embedding-based, high-speed semantic vector search.
- Especially effective for rare or multi-faceted medical questions.
Benefits:
- Multi-topic reasoning and recommendations
- Context-enhanced answers
- Medical outputs based on reliable, stored sources
Sydney is engineered for offline, CPU-optimized operation:
- Glow-TTS + Multi-band MelGAN for natural audio without a GPU.
- Whisper ASR: Select any model size (
tinytolarge), INT8 or full precision, with auto GPU fallback for faster transcription if CUDA is detected. - Precomputed Granite embeddings for instant retrieval.
- Chunking long responses prevents CPU overload.
- Async threading and persistent memory caching for non-blocking, fast UI.
Result: High-speed inference and voice output — even on standard CPUs, with zero reliance on cloud services.
Sydney is a true problem-solving assistant:
- Multi-step reasoning: Synthesizes symptoms, conditions, and treatment options.
- Cross-topic retrieval: RAG + memory + LLM combine for actionable guidance.
- Structured answers: Always organized as clear bullet points and warnings.
- Multimodal: Seamless support for both text and voice input/output.
- Offline & CPU-Optimized: Works fully offline, lightweight even on basic hardware.
- Memory + Multi-RAG: Maintains context, retrieves the most accurate information, solves multi-topic questions.
- Granite 47M R2 Embeddings: Advanced semantic retrieval for medical reasoning.
- Critical Keyword Filter: Ensures safety by immediately routing emergencies.
- Model Flexibility: Choose any Whisper version for STT; switch to GPU if available.
- MedGemma-4B as LLM: Uses Google’s open-source “MedGemma-4B” (or switchable among available MedGemma variants), optimized for medical and health prompts.
- Problem-solving ready: Stepwise reasoning and actionable suggestions.
Example Use Case:
User: I have hypertension and mild kidney issues. Can I exercise, and what should I eat?
Assistant:
- Exercise: Low-impact cardio, yoga, 3-5 times/week
- Diet: Low sodium, moderate protein, avoid processed foods
- Monitor: Blood pressure and kidney function regularly
- Warning: Avoid strenuous exercises that elevate blood pressure rapidly
git clone https://github.com/AvijitShil/Sydney_m2.git
cd Sydney_m2python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windowspip install -r requirements.txtOr manually:
pip install torch numpy gradio soundfile coqui-tts faster-whisper langchain_ollamapython main.py- Chat Tab: Type health questions → get both text and audio.
- Speak Tab: Record spoken queries → instant transcription (using Whisper), text + audio output.
- Emergency Tab: If triggered by critical keywords, instantly routes you to emergency instructions.
- Whisper Model Selection: Change ASR model via config (
tiny,base,small,medium,large) and select CPU/GPU mode. - LLM Model: Default is MedGemma-4B, but you may switch to other MedGemma versions as needed.
- Memory Size: Adjust
CONFIG["max_memory_turns"]. - TTS Speed / Style: Change
speedparameter intts_to_file(). - Knowledge Base Expansion: Add medical documents to
CONFIG["medical_docs"]. - Emergency keywords & handling: Customize in
is_medical_emergency()andget_emergency_response(). - Model selection: Switch Whisper model or toggle GPU/CPU via configuration.
.
├─ main.py # Core application
├─ rag_system.py # RAG & emergency handling
├─ requirements.txt # Dependencies
├─ memory.json # Persistent conversation memory
├─ config.py # Configuration logic and model switching
└─ README.md
- Fork & PR for improvements
- Fine-tune TTS or LLM for naturalness
- Add medical resources or RAG connectors
- Enhance offline performance or add additional models
- Ensure no actual medical advice is hard-coded
MIT License — see LICENSE for details.
This AI assistant is for educational and informational purposes only. It cannot and should not replace professional medical advice, diagnosis, or treatment. Always consult qualified healthcare professionals for any health concerns.
- 🎤 Speech-to-Text (STT): Fully local with configurable Whisper models (tiny–large) & automatic GPU detection
- 🧠 LLM Processing: MedGemma-4B (default) via Ollama; supports all Gemma/MedGemma models
- 🔊 Text-to-Speech (TTS): GlowTTS for expressive, local synthesis
- 📚 Retrieval-Augmented Generation (RAG): Knowledge base + vector embeddings
- 💭 Memory Management: Persistent, multi-turn context
- 🚨 Emergency Detection: Critical keywords filter for urgent cases
- ⚡ CPU/GPU Optimized: Async, model caching, chunking, and low-resource safe
- Python: 3.8 or higher
- RAM: 8GB (minimum), 16GB recommended
- Storage: ~5GB free for models
- Dependencies:
- PyTorch (CPU/CUDA)
- Ollama (for LLM hosting)
- Audio drivers (mic & speaker)
- Optional: CUDA-enabled GPU for Whisper
sydney/
├── main.py
├── rag_system.py
├── config.py
├── requirements.txt
├── memory.json
├── response.json
└── medical_knowledge_db/
- 100% Local Processing: No cloud dependency, private by design
- Configurable Memory: All data stored locally; optional encryption
- Critical Keyword Filter: Protects users, instant emergency handling
- Medical Disclaimers: Always appended where appropriate
- Async Processing: Non-blocking STT, LLM, and TTS
- Model Caching: Rapid warm-up and switching
- Memory Management: Rolling context, persistent storage
- Embeddings Search: Fast RAG retrieval
- Critical Keywords Filter: Auto-flags emergencies, instantly outputs instructions
- Medical Disclaimers: Always included for health queries
- Graceful Fallbacks: Safe on low-resources or mismatched hardware
- Local-Only: Nothing leaves your device
Sydney is not just an assistant—it’s an offline, multimodal AI medical companion powering advanced reasoning, robust safety, and privacy-focused local operation, built on state-of-the-art models and retrieval pipelines.