A sophisticated AI assistant with advanced features including streaming responses, G2P-enhanced text-to-speech, and emotional intelligence.
- Real-time token generation like ChatGPT/Claude
- 2.5x faster perceived speed with immediate response start
- Sentence-level TTS integration for natural conversation flow
- Comprehensive error handling with fallback to non-streaming
- Misaki G2P engine for improved pronunciation
- Custom pronunciation dictionary for technical terms
- Multi-engine TTS support (Kokoro, Edge-TTS, pyttsx3 fallbacks)
- Emotion-aware voice selection with different speaking styles
- Unicode-safe logging - all emoji/special character issues resolved
- Optimized AI model: Now using
gemma3:1b-it-q4_K_Mfor better performance - Advanced error monitoring with automatic recovery
- Memory optimizations and cleanup management
- Dual Modes: Serious (business) and Free (casual) personality modes
- Intelligent Switching: Automatically detects context and switches modes
- Sarcastic & Witty: Light humor and contextual sarcasm in casual mode
- Professional: Efficient and formal communication in serious mode
- Local AI: Ollama + Phi-3 Mini for privacy and speed
- Cloud Fallback: Google Gemini API for complex reasoning
- Smart Routing: Automatically chooses best AI service for each task
- Conversation History: Remembers your chats and context
- User Preferences: Learns and stores your preferences
- Personal Facts: Remembers important information about you
- Smart Retention: Automatic cleanup of old, unimportant memories
- Speech-to-Text: OpenAI Whisper (local processing)
- Text-to-Speech: pyttsx3 with personality-matched voice tones
- Wake Words: "Hey Autumn" voice activation
- Mode-Aware: Voice changes between serious and casual modes
- Draggable Widget: Always-on-top, movable interface
- Autumn Ball: Compact autumn leaf icon when minimized
- Expandable Chat: Full chat interface when expanded
- Modern Design: Autumn-themed colors and styling
- Calendar: Schedule management and conflict detection
- Web Search: Controlled access to weather, news, and information
- Task Management: Proactive reminders and follow-ups
Optimized for lightweight systems:
- RAM: 4GB minimum, 8GB+ recommended
- Storage: 5GB for models and dependencies
- GPU: Optional (RTX 3050 4GB works perfectly)
- CPU: Any modern processor (tested on Ryzen 5 5500H)
Resource Usage:
- Memory: <200MB typical usage
- CPU: <50% on dual-core systems
- Storage: ~3GB for AI models
- Python 3.8+
- Ollama installed and running
- GPU recommended (CUDA/ROCm) for best performance
-
Clone the repository
git clone https://github.com/shhreyuuFTW/Autumn_AI_Assistant.git cd Autumn_AI_Assistant -
Install dependencies
pip install -r requirements.txt
-
Set up Ollama model
ollama pull gemma3:1b-it-q4_K_M
-
Configure API keys (optional)
# Copy example config and edit with your API keys cp config/settings.py.example config/settings.py # Edit config/settings.py with your Gemini API key if using cloud features
-
Run Autumn
python app.py
Experience the real-time streaming responses:
python test_streaming_simple.pyThis demonstrates:
- Token-by-token generation (like ChatGPT)
- Sentence detection for TTS integration
- Performance comparison with traditional non-streaming
Test the improved pronunciation:
python demo_g2p_fixed.pyCompare traditional vs G2P-enhanced pronunciation with difficult words.
Autumn_AI_Assistant/
βββ app.py # Main application entry point
βββ requirements.txt # Python dependencies
βββ setup.py # Automated setup script
βββ .env # Environment configuration
βββ config/
β βββ personality.py # Autumn's personality system
β βββ settings.py # Application settings
βββ core/
β βββ brain.py # AI intelligence and routing
β βββ memory.py # Memory and storage system
β βββ voice.py # Voice input/output
βββ services/
β βββ gemini.py # Gemini API integration
β βββ web_search.py # Web information access
β βββ calendar.py # Calendar integration
βββ gui/
β βββ widget.py # PyQt6 GUI interface
βββ data/
βββ autumn_memory.db # SQLite memory database
# Required
GEMINI_API_KEY=your_key_here
# Optional Performance
MEMORY_LIMIT_MB=200
CPU_LIMIT_PERCENT=50
# Optional Features
AUTO_START_VOICE=true
DEBUG=false
LOG_LEVEL=INFO- STT Model: Whisper Tiny (39MB, fast)
- TTS Engine: pyttsx3 (local, customizable)
- Wake Words: "autumn", "hey autumn"
- Primary: Ollama Phi-3 Mini (2.3GB, local)
- Fallback: Google Gemini Flash (cloud)
You: "Schedule a meeting with John tomorrow at 2 PM"
Autumn: "I'll schedule that meeting for you! Let me check for any conflicts... Done! Meeting with John set for tomorrow at 2 PM. Anything else you need, sir? π"
You: "Hey Autumn"
Autumn: "Yes sir? How can I help you today?"
You: "What's my schedule like?"
Autumn: "Let me check your calendar... You have three meetings today: 9 AM with the team, 1 PM lunch with Sarah, and 4 PM project review. Looks like a busy but manageable day!"
You: "Urgent: I need the quarterly report ASAP"
Autumn: [Serious Mode] "Understood. I'll locate the quarterly report immediately and have it ready for you. Processing now."
You: "Thanks, you're awesome!"
Autumn: [Free Mode] "Aww, thank you! Just doing my job... though I do it with exceptional style! π"
- Privacy: Voice and personal data stay on your machine
- Speed: Local processing for instant responses
- Reliability: Works offline for basic interactions
- Efficiency: Optimized for minimal resource usage
User Input β Mode Detection β Task Classification
β
ββββ Simple Chat βββ Local AI (Phi-3)
ββββ Complex Task βββ Cloud AI (Gemini)
ββββ Web Info βββββββ Web APIs
ββββ Calendar βββββββ Calendar Service
β
Response + Personality β Voice/Text Output
- SQLite Database: Persistent storage
- Smart Indexing: Fast retrieval of relevant memories
- Automatic Cleanup: Configurable retention policies
- Context Awareness: Uses memory to inform responses
- New Services: Add to
services/directory - Personality Traits: Modify
config/personality.py - Memory Types: Extend
core/memory.py - GUI Components: Update
gui/widget.py
# Test voice system
python -c "from core.voice import AutumnVoice; import asyncio; asyncio.run(AutumnVoice().initialize())"
# Test AI brain
python -c "from core.brain import AutumnBrain; print('Brain module OK')"
# Test memory
python -c "from core.memory import AutumnMemory; print('Memory module OK')"# Enable debug mode
export DEBUG=true
export LOG_LEVEL=DEBUG
python app.pyPyAudio Installation (Windows)
# If PyAudio fails to install
pip install pipwin
pipwin install pyaudioOllama Connection
# Check if Ollama is running
ollama list
curl http://localhost:11434/api/tagsVoice Issues
# Test microphone access
python -c "import pyaudio; print('Microphone OK')"Memory Issues
# Check database
sqlite3 data/autumn_memory.db ".tables"- Use Whisper "tiny" model for fastest STT
- Limit conversation history to 50 turns
- Enable automatic memory cleanup
- Use local AI for simple interactions
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - See LICENSE file for details
- OpenAI Whisper for excellent local STT
- Ollama for making local LLMs accessible
- PyQt6 for the beautiful GUI framework
- Google Gemini for powerful cloud AI capabilities
Built with β€οΈ for productivity and personality
Autumn AI Assistant - Your sophisticated digital companion π