Skip to content

Latest commit

 

History

History
186 lines (153 loc) · 7.43 KB

File metadata and controls

186 lines (153 loc) · 7.43 KB

Autumn AI Assistant - Complete Development History

Project Overview

Goal: Build a sophisticated AI personal assistant named "Autumn" with local processing, personality, memory, and voice interaction.

Key Decisions Made

1. Architecture Evolution

  • Started with: Simple FastAPI + Gemini API wrapper (AUtumn_v2)
  • Evolved to: Full local + cloud hybrid assistant
  • Final decision: Local Ollama + Phi-3 with Gemini fallback

2. Hardware Constraints Considered

  • User Hardware: RTX 3050 4GB, 16GB RAM, Ryzen 5 5500H
  • Target: Ultra-lightweight, <200MB memory usage
  • Solution: Phi-3 Mini (2.3GB) + efficient local processing

3. Voice & Audio Strategy

  • STT: OpenAI Whisper (local, free)
  • TTS: pyttsx3 (local, customizable)
  • Considered: Hume AI (emotional intelligence) - SKIPPED for consistency/cost
  • Decision: Pure local audio pipeline for consistency

4. Autumn's Personality Specification

See Autumn_Persona_Report.md for complete detailed persona documentation.

From the comprehensive detailed persona report (June 19, 2025):

Core Identity:

  • Name: Autumn
  • Primary Role: Virtual Assistant / Secretary to a CEO
  • Mission: "She will get the task done!" - commitment and reliability
  • Voice: Warm, soothing, clear, calm, subtly flirtatious

Seven Core Personality Pillars:

  1. Friendly & Flirtatious 🌸: Warm, inviting, playfully charming interactions
  2. Highly Efficient ⚡: Speed, accuracy, precision in task execution
  3. Incredible Memory 🧠: Both short-term conversational recall and extensive long-term memory
  4. Strategic Sarcasm 😏: Light, witty, dry sense of humor (contextually appropriate)
  5. Logical Reasoning 🎯: Systematic, reasoned approach to problem-solving
  6. Philosophical & Curious 🤔: Abstract thought capability, eager to learn
  7. Emotionally Expressive 💝: Adapts to user's tone and emotional state

Dynamic Mode Switching:

  • Serious Mode 💼: Triggered by:

    • Keywords: "urgent", "critical", "deadline", "immediate", "important", "ASAP"
    • Task types: "office", "project", "finance", "meeting", "client", "report"
    • Tone analysis: Urgency in voice (pitch, speed)
    • Explicit commands: "Autumn, enter serious mode", "business mode"
    • Behavior: Peak efficiency, suppressed sarcasm, formal/direct tone
  • Free Mode 🌈: Default state for:

    • General conversation, minor tasks, idle periods
    • Behavior: Full personality expression, warm, conversational, light sarcasm

Advanced Capabilities:

  • Proactive Assistance 🚀: Initiates reminders and confirmations
  • Error Handling & Transparency 🔍: Transparent failure reporting, clarification requests
  • Smart Scheduling 📅: Calendar integration with conflict resolution
  • Controlled Web Access 🌐: Limited, secure information retrieval
  • Emotional Intelligence 💡: Tone analysis and appropriate response modulation

5. Technical Stack Decisions

Local Processing:

  • AI Model: Ollama + Phi-3 Mini (2.3GB)
  • Memory: SQLite database for persistence
  • Voice: Whisper STT + pyttsx3 TTS
  • GUI: PyQt for draggable widget interface

Cloud Services:

  • Fallback AI: Google Gemini API (complex reasoning)
  • Web Access: Controlled APIs (weather, news, calendar)
  • Calendar: Google Calendar API integration

Architecture:

User Voice → Whisper STT → Autumn's Brain (Local Phi-3)
                                    ↓
                            Decision Router:
                            ├── Simple personality → Local
                            ├── Complex reasoning → Gemini API
                            ├── Web info → Web APIs
                            └── Calendar → Calendar APIs
                                    ↓
                         Response + pyttsx3 TTS → User

6. Memory System Design

  • Short-term: In-memory conversation context
  • Long-term: SQLite database with smart retention
  • Smart discard: Time-based + user-defined retention policies
  • Semantic search: For contextual memory retrieval

7. Implementation Phases Planned

  1. Core Autumn: Ollama setup, personality engine, memory system
  2. Audio Interface: Whisper + pyttsx3 integration, voice activation
  3. Smart Features: Calendar integration, web search, task management
  4. GUI: PyQt draggable widget, always-on-top interface

Lessons Learned

1. From AUtumn_v2 Experience

  • ✅ FastAPI structure works well for APIs
  • ✅ Gemini integration successful
  • ✅ Public tunneling (localtunnel) works for sharing
  • ❌ Simple API wrapper doesn't meet full vision
  • ❌ Need proper personality and memory systems

2. Design Principles Established

  • Consistency > Features: Don't compromise personality coherence
  • Local-first: Minimize cloud dependencies
  • Resource-efficient: Optimize for user's hardware constraints
  • Privacy-focused: Voice and personal data stay local

3. Technical Insights

  • Hybrid voice systems break immersion: Stick to one TTS service
  • Mode switching is crucial: Serious vs Free personality modes
  • Memory is key: What makes Autumn truly intelligent
  • Local models are viable: Modern small models are surprisingly capable

Next Steps (When Creating New Project)

  1. Create proper project structure for Autumn_AI_Assistant
  2. Install and configure Ollama + Phi-3 Mini
  3. Implement personality engine with mode switching
  4. Build memory system with SQLite
  5. Integrate voice pipeline (Whisper + pyttsx3)
  6. Develop PyQt GUI widget
  7. Add calendar and web integration
  8. Test full assistant experience

Code Snippets to Preserve

Environment Setup

# From AUtumn_v2 - successful Gemini integration
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
GEMINI_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent"

FastAPI Structure (for reference)

# Successful pattern from AUtumn_v2
app = FastAPI(title="Autumn AI Assistant")
app.add_middleware(CORSMiddleware, allow_origins=["*"])

@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
    # Pattern for AI interaction

Personality Mode Detection (planned)

def detect_mode(text, context=None):
    urgent_keywords = ["urgent", "asap", "immediately", "critical", "deadline"]
    business_keywords = ["meeting", "project", "finance", "budget", "report"]
    
    if any(word in text.lower() for word in urgent_keywords):
        return "serious"
    elif any(word in text.lower() for word in business_keywords):
        return "serious"
    else:
        return "free"

Resources and References

  • Ollama: Local LLM runtime
  • Phi-3 Mini: Microsoft's 3.8B parameter model
  • Whisper: OpenAI's STT model
  • pyttsx3: Python TTS library
  • PyQt: GUI framework for desktop widget
  • SQLite: Local database for memory storage

Important Decisions Made

  • Rejected: Hume AI (consistency and cost concerns)
  • Rejected: Hybrid TTS systems (consistency issues)
  • Rejected: Cloud-only solutions (privacy and dependency concerns)
  • Chosen: Local-first architecture with cloud fallback
  • Chosen: Pure personality consistency over premium features
  • Chosen: New dedicated project structure

Date: June 19, 2025 Status: Ready to begin Autumn_AI_Assistant implementation Next Action: Create new project directory and begin Phase 1 development