Releases: aj47/SpeakMCP
SpeakMCP v1.4.0
SpeakMCP v1.4.0
New Features
- Mobile Settings Parity: Added settings for skills, memories, personas, and loops on mobile
- Supertonic TTS: Added Supertonic as a new local text-to-speech option
- Past Sessions Modal: Moved past sessions from sidebar to a modal with delete support
- Agent Spoken Output: Added explicit
respond_to_userspoken-output flow for multi-channel support - Skills in Prompts: Enabled skills appear in predefined prompts dropdown menu
- Ephemeral Messages: Hide internal completion nudge via ephemeral message system
- Continue Conversation Shortcuts: Added Shift+hotkey keybinds to continue last conversation
- Unlimited Agent Runs: Added option to disable max iteration limit for unlimited agent loops
- Transcription Preview: Added opt-in live transcription preview during recording
- Mobile Session Sync: Sync sessions between mobile and desktop with lazy loading
Bug Fixes
- Fixed React deduplication and streaming repetition bugs on mobile
- Fixed duplicate assistant messages and preserved mobile progress on empty history
- Fixed TTS playback issues: stale generation results, double playback on remount, cleanup timing
- Fixed streaming state getting stuck in "generating" when throttle drops events
- Fixed sidebar collapse button spacing on macOS
- Fixed search icon alignment and standardized icon-text spacing across UI
- Fixed dialog grid children width overflow
- Fixed input drafts sync from overwriting user typing
- Fixed mobile transcription: word-boundary matching, StrictMode compat, focus timeouts
- Fixed hold-mode race conditions in continue-conversation shortcuts
- Fixed WebM to float32 PCM decoding for Parakeet STT
- Fixed network retry delays made interruptible by kill switch
- Bundled sherpa-onnx native packages properly in packaged app
- Gated SpeakMCP-specific fetches behind
isSpeakMCPServercheck
Improvements
- Consolidated duplicated summarization settings UI
- Session tiles now fill available vertical space
- Final assistant messages expand by default in tile/sessions view
- Grid view click now resets tile layout (removed separate reset button)
- Per-message TTS on mobile
Downloads
- macOS (Apple Silicon):
SpeakMCP-1.4.0-arm64.dmg- Signed and notarized - Android:
SpeakMCP-1.4.0.apk
v1.3.0
SpeakMCP v1.3.0
🚀 New Features
- Agent Loops System — Schedule agents to run with specific prompts at regular intervals (#1036)
- Terminal QR Code — QR code rendering for mobile pairing in headless/SSH/terminal environments (#1025)
- Collapsible Queued Messages — Message queue panel is now collapsible and height-limited (#1042)
- Rapid Fire Voice (Mobile) — Hold-to-speak rapid fire voice input with large mic button (#1024)
- Conversation Keybinds — Shift+hotkey shortcuts for conversation navigation (#1021)
- Supertonic TTS — New TTS provider support (#1000)
- Disable Max Iteration Limit — Option to remove agent iteration cap (#1017)
- Google Assistant Integration (Mobile) — Trigger SpeakMCP via "Hey Google" App Actions (#1022)
- File-Based Dynamic Context Discovery — Efficient token usage via file-synced MCP tools, profiles, and skills (#897)
- Debug Logging — Comprehensive debug logging feature with file management and UI (#178)
- Persistent Cloudflare Tunnel URL — Store and display last known tunnel URL for reconnection (#723)
- Bundled Electron MCP Skill — Bootstrap skill for local Electron MCP server (#1056)
- Standalone Server Package — Central server architecture for multi-client support (#790, #791)
🐛 Bug Fixes
- Desktop Kill Switch — Fixed kill switch not stopping agent sessions; now session-aware with interruptible retry delays (#1058, #255, #1023)
- Mobile Session Sync — Fixed mobile sessions failing to load messages on desktop (#1059)
- Mobile Session Polling — Added foreground session polling so new desktop sessions are detected (#1055)
- Follow-up Message Display — Messages now appear immediately after session stop (#1057)
- Claude Models via OpenRouter — Prevent assistant message prefill error (#1037)
- Mobile Streaming — Fixed double words in streaming responses (#1028)
- Model Search Sticky — Search input stays sticky in settings dropdown (#1041)
- Waveform Height — Increased waveform visualization height, prevent excessive shrink (#1052)
- Settings Sidebar Scroll — Settings panel now scrolls with sidebar (#1051)
- Mobile Rapid Fire UX — Improved session visibility and voice feedback (#1038)
- Verification Corruption — Prevent verification from corrupting final agent response (#1050)
- Renderer Crash Recovery — Auto-recover from GPU/renderer process crashes (#810)
- Mobile Network Failures — Graceful error handling with retry for app backgrounding (#489)
- Hardened Runtime — Enabled by default for macOS permission persistence (#847)
⚡ Performance
- Async Index + Throttled Progress — Async+debounced conversation index writes & throttled progress emits for dramatically reduced main process blocking (#1060)
🎨 UI Improvements
- Sessions Sort by Last Modified (#1016)
- Session Tiles Fill Vertical Space (#1018)
- Search Bar Model Dropdown Fix (#1015)
- Groq API Pricing Research (#1009)
- Improved Agent Documentation (#1010)
📦 Downloads
| Platform | File |
|---|---|
| macOS (Apple Silicon) | SpeakMCP-1.3.0-arm64.dmg |
| macOS (Intel) | SpeakMCP-1.3.0-x64.dmg |
| Android | SpeakMCP-1.3.0.apk |
Note: macOS DMGs are code-signed and notarized by Apple for safe installation.
v1.2.0
🎯 Major Features
🎭 Agent Personas & Multi-Agent Delegation (#920)
- Agent Personas - Create specialized AI personas with custom system prompts, tools, and skills
- Delegation System - Route tasks to specialized sub-agents based on expertise
- Agent Profile Management - Full CRUD for managing delegation targets
- Internal & External Agents - Support for both built-in personas and external agents
- Settings UI - Configure at Settings → Agent Personas
🤖 External ACP Agents (#894, #920)
- ACP Protocol Support - Connect to external AI agents via Agent Client Protocol
- Claude Code Integration - Delegate coding tasks to Claude Code
- Auggie Support - Connect to Augment's Auggie agent
- Multiple Transport Types:
stdio- Spawn local agent processesremote- Connect to HTTP endpointsinternal- Built-in delegation within SpeakMCP
- Delegation Tools:
list_available_agents- Discover available specialized agentsdelegate_to_agent- Route tasks to specific agentsget_delegation_status- Check on delegated task progress
🧠 Dual-Model Agent Mode (#919)
- Strong Model for Planning - Use powerful models for complex reasoning
- Weak Model for Summarization - Use faster/cheaper models for UI summaries
- Agent Summary View - See compact summaries of agent progress
- Memory System - Save important findings to persistent memory files
- Settings UI - Configure at Settings → Providers & Models
💾 Agent Memory System (#919, #963, #975)
- Persistent Memories - Save key information across sessions
- Ultra-Compact Format - Single-line memories for efficiency
- Agent Memory Tools:
list_memories- View saved memoriessave_memory- Create new memoriesdelete_memory- Remove memoriesdelete_multiple_memories- Bulk deletedelete_all_memories- Clear all memories
- Bulk Delete UI - Select and delete multiple memories in Settings
🧠 Agent Skills System (#895, #958)
- Skills Service - Modular skills system for enhancing AI capabilities
- Per-Profile Skills - Each profile can have its own set of skills
- Import from GitHub - Import skills directly from GitHub repositories
- Import from Local Folders - Add skills from local directories
- Progressive Loading - Skills load on-demand to reduce token usage
- Auto-Refresh - New skills detected automatically without restart
- Proactive Context - Skills injected into system prompt automatically (#942)
- Bundled Skills - Includes "Agent Skill Creation" meta-skill
📊 Langfuse Observability Integration (#929, #941, #947)
- LLM Call Tracing - All LLM calls traced with model, prompts, responses, and token usage
- Agent Session Traces - Complete agent workflows tracked from start to finish
- MCP Tool Call Spans - Each tool invocation logged with inputs/outputs
- Sessions Support - Group traces by conversation for multi-turn debugging
- Profile Tags - Filter traces by profile name in Langfuse dashboard
- Optional Dependency - Install langfuse only when needed
- Settings UI - Configure via Settings > General > Langfuse Observability
🔗 Persistent Cloudflare Tunnel URLs (#922, #954)
- Named Tunnels - Persistent URLs that remain the same across restarts
- Quick Tunnels - Existing random URL functionality preserved
- Auto-Start on Launch - Tunnel can start automatically when app opens
- Tunnel Mode Selector - Choose between Quick and Named tunnels in UI
- Available Tunnels List - Shows existing tunnels when logged in
📱 Mobile & Cross-Device Features (#962, #972)
- Chat Sync - Sync chat state between desktop and mobile app
- Conversation Continuity - Continue conversations across devices
- Compact Chat UI - Single-line collapsed view for mobile
- Pull-to-Refresh - Sync with desktop in real-time
🔧 Inter-Agent Communication (#959)
send_agent_message- Send messages between running agent sessions- Agent Coordination - Enables collaborative multi-agent workflows
list_running_agents- Discover active agent sessions
📱 WhatsApp Harness Improvements (#910, #905)
- Message Handling in MCP Server - Moved message logic from desktop harness to MCP server
- Conversation ID Persistence - Tracks conversations across sessions
/newCommand - Start fresh conversations with/newcommand- Automatic Typing Indicator - Shows typing when agent starts processing
- Harness Output Mode - Configurable auto-response via harness layer
- WhatsApp Toggle - Enable/disable in main settings UI (#934)
💾 Context Compaction & Memory Management (#908, #909)
- Persistent Compaction - Older messages summarized and saved to disk
- No More Re-summarization - Summaries persist across sessions
- Conversation Compaction on Load - Summarizes older messages when exceeding 20 messages
- MCP Process Cleanup - Properly terminates MCP server processes on app quit
🎮 Profile CRUD Tools (#938)
create_profile- Create new profiles programmaticallyupdate_profile- Modify existing profilesdelete_profile- Remove profiles with safeguardsduplicate_profile- Copy profiles including all configurations
📐 Model Registry with Fuzzy Matching (#907)
- ~100 Models Supported - Comprehensive registry for context window detection
- Fuzzy Matching - Correctly identifies models through proxies (e.g., Claude via OpenAI-compatible endpoints)
- Providers Covered - Anthropic, OpenAI, Google, xAI, DeepSeek, Mistral, Qwen, Llama
🛠️ MCP Tool Discovery (#948, #950)
- Lazy Loading - Lightweight system prompt with on-demand tool details
list_server_tools- Get all tools from a specific MCP serverget_tool_schema- Get full JSON schema for any tool- Dynamic Tool Filtering - Reduce tool overhead from 64 to ~20 per call
🚀 UX & UI Improvements
Session Tiles
- Expand to Full Window (#915) - New button to expand session tile to fill entire window
- Clickable Title Area (#886) - Click anywhere on title to collapse/expand
- Removed Max Height Constraint (#927) - Tiles can now fill available vertical space
- Proper Sizing Transitions (#914) - Panel resizes correctly when switching from voice to agent mode
Floating Panel
- Auto-Hide When Main Focused (#887) - Panel hides when main window has focus
- Cleaner UI (#904) - Removed ESC hint and tool call progress indicator
- Responsive Hotkey Hints (#898, #980) - Hints hide on narrow screens and during sessions
- Drag Fix (#978) - Panel no longer closes during drag/button interactions
Sidebar & Navigation
- Improved Sidebar UI (#930) - Added 'Past' label, simplified search placeholder
- Past Sessions Cleanup (#891) - Removed redundant title text
- Agent Mode Keybind Visibility (#892) - Aura keybind now visible on sessions page
Mobile App
- Compact Tool Calls (#932, #972) - Tool calls collapse to single line
- Session Title Fix (#931) - Correct title shows instead of 'Transcribing...'
- Compact Chat UI (#972) - Single-line collapsed view matching desktop styling
Streamer Mode (#893)
- Hide Sensitive Info - Masks phone numbers, API keys, QR codes, and URLs
- Global Indicator - Shows in sidebar footer when active
- Privacy During Streaming - Protect sensitive data when screen sharing
🔧 API & Settings Improvements
Feature Toggle API (#939)
verificationEnabled- Query/toggle verification settingmessageQueueEnabled- Query message queue settingparallelToolExecutionEnabled- Query parallel execution settingtoggle_verification- New tool to enable/disable verification
New Builtin Tools
read_media_file- Read images/audio as base64 for multimodal LLM inputload_skill_instructions- Fetch full skill instructions on-demand
Settings Reorganization
- Langfuse Moved (#949) - Settings moved from sidebar to General Settings page
- Removed Auto-Configured Filesystem Server (#961) - Use
execute_commandfor filesystem operations
🐛 Bug Fixes
Agent Loop & Completion (#946, #967, #970)
- Fixed empty response retry loops with counter and limit
- Fixed verification loops incrementing fail count in all paths
- Fixed tool context loss during context shrinking
- Simplified LLM integration trusting model's completion signals
- Added tool timeout handling (20s default, 120s for browser automation)
- Fixed parallel call race conditions
Verification Logic (#951)
- Fixed direct verifier call bypassing safety features
- Fixed duplicate messages in verification context
- Fixed loop detection false positives on short responses
- Fixed empty tool results handling
- Unified all verification paths
Voice & Recording
- Mic Button Fix (#906) - Voice recording now takes precedence over text input
- Waveform Fix (#976) - Waveform now shows for normal voice dictation mode
Model & Preset
- Summarization Model Sync (#977) - Summarization model updates when switching presets
Memory & Processes
- MCP Process Cleanup (#909) - No more orphaned node processes (2GB+ memory leak fixed)
📦 Downloads
Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS
macOS Builds (Signed)
- DMG:
SpeakMCP-1.2.0-arm64.dmg|SpeakMCP-1.2.0-x64.dmg
Android
- APK:
SpeakMCP-1.2.0.apk
Linux
- AppImage:
SpeakMCP-1.2.0.AppImage
🔄 Migration Notes
- No breaking changes - All existing functionality preserved
- Automatic migration - Settings and data migrate seamlessly
- New features opt-in - All new features work with existing configurations
- Backward compatible - Existing API endpoints and data structures un...
v1.1.0
🎯 Major Features
🤖 Vercel AI SDK Migration (#812)
- Simplified LLM Code - Removed ~1,100 lines of custom HTTP/fetch logic
- Provider Flexibility - Easy to add new providers (Anthropic, etc.)
- Better Streaming - AI SDK handles streaming protocols natively
- Type Safety - Better TypeScript support from AI SDK
- Maintained Compatibility - Same public API, MCP tools work unchanged
📋 Kanban View for Sessions (#807)
- Three-Column Layout - Idle, In Progress, and Done columns
- Visual State Indicators - Clear status for each session
- View Toggle - Switch between Grid and Kanban views
- Session Organization - Better workflow management
📝 Predefined Prompts (#809)
- Save Frequent Prompts - Quick access to commonly used prompts
- One-Click Insert - Click to insert prompt into input field
- Full Management - Add, edit, and delete saved prompts
- Persistent Storage - Prompts saved across sessions
📱 Mobile Settings Management (#744)
- Profile Switching - Switch between profiles directly from mobile app
- MCP Server Management - View connection status and enable/disable servers remotely
- Feature Toggles - Control post-processing, TTS, and tool approval from mobile
- Pull-to-Refresh - Sync settings with desktop in real-time
- New API endpoints:
/v1/profiles,/v1/mcp/servers,/v1/settings
🔍 MCP Registry Integration (#785)
- Official Registry Browser - Discover 100+ MCP servers from the official registry
- One-Click Installation - Add servers with a single click
- Smart Search - Find servers by name and description
- Server Type Badges - npm, PyPI, Docker, Remote server indicators
- 5-Minute Caching - Reduced API calls for better performance
📤 Enhanced Profile Export/Import (#772)
- MCP Server Definitions - Export now includes all enabled MCP server configurations
- Model Settings - Export includes model configuration settings
- Smart Import - Merges MCP definitions without overwriting existing config
- Easy Sharing - Share complete profiles with team members
🚀 Performance & UX Improvements
Recording Latency Reduction (#734)
- 250ms faster - Reduced hold-to-record delay from 800ms → 250ms
- Overlapped initialization - Start recording before showing panel UI
- Snappier response - Faster feedback when holding Ctrl
Mobile Text Interaction (#735)
- Expandable/Collapsible Text - Tap anywhere on collapsed text to expand
- Selectable Content - Copy LLM responses and tool parameters
- Better Tool Cards - Larger tap targets for expanding tool results
- Visual Feedback - Pressed states for better UX
Session Management (#739, #740)
- Always-Visible Start Buttons - Start new sessions anytime, even with active sessions
- Queueable Voice Input - Record voice messages during agent processing
- Message Queuing - Transcripts queue automatically when agent is busy
UI Polish (#733, #738, #800, #801, #806, #811)
- Stop Sign Icon - Changed kill switch from X to OctagonX for clarity
- Collapsed Servers - MCP servers collapsed by default for cleaner UI
- Kill Switch in Follow-ups - Stop button now in follow-up input panels
- Responsive MCP Modal (#800) - Tool details modal now scrollable and responsive
- Improved Sidebar Layout (#801) - Settings above sessions, sessions scroll to bottom
- Sessions Icon (#806) - Quick navigation icon in collapsed sidebar
- Edit Profiles Shortcut (#811) - Direct link to profile settings in dropdown
Mobile Improvements (#794, #816)
- Android Branding (#794) - Updated app name, icons, and
speakmcp://deep linking - Conversation Recovery (#816) - Recover server state on connection retry (no duplicates)
🔧 Code Quality & Maintenance
Major Refactoring Sprint (#775, #776, #777, #778, #779, #780)
- keyboard.ts Modularization (#775) - Split 1,170 lines into focused modules
- MCP Config Manager (#776) - React Context to eliminate prop drilling
- tipc.ts Split (#777) - 66% reduction (2,913 → 974 lines) into 14 domain modules
- MCP Service Refactor (#778) - Extracted into focused, maintainable modules
- ChatScreen Components (#779) - Mobile app code organization with custom hooks
- LLM Provider Abstraction (#780) - Clean provider interface for all LLM backends
LLM Code Consolidation (#781)
- Removed structured-output.ts - Consolidated unused code (~285 LOC reduction)
- Moved makeStructuredContextExtraction - Relocated to llm-fetch.ts
Shared Package Improvements (#773)
- Type Consolidation - AgentProgressStep, AgentProgressUpdate moved to shared
- New Utilities - formatDuration, formatTimestamp, statusColors
- Custom Hooks - useCollapsibleState, useCollapsibleSet for UI state
- Dependency Cleanup - Proper runtime vs dev dependency classification
Testing Infrastructure (#770)
- E2E Tests - Playwright tests for Electron with custom fixtures
- Smoke Tests - App launch and basic navigation tests
- Settings Tests - Configuration and provider testing
- MCP Tests - Server management and session tests
- CI Integration - GitHub Actions workflow for automated testing
Refactoring Issue Templates (#745)
- Created 9 detailed issue templates for future refactoring work
- Includes proposals for tipc.ts, mcp-service.ts, keyboard.ts modularization
🐛 Bug Fixes
LLM & Provider
- Empty Response Handling (#793, #797) - Fixed false "Network error" for valid empty completions
- Provider Name Display (#737) - Show actual preset name (OpenRouter, Together AI) instead of generic "OpenAI"
- Groq TTS Update (#784) - Updated to Orpheus models (PlayAI deprecated)
Mobile & Desktop
- Disabled Server Tools (#743) - Hide tools from disabled MCP servers
- JSX Nesting (#728) - Fixed parse errors in MCP config manager
- Tunnel Persistence (#722) - Auto-reconnect mobile app on restart with stable device ID
UI/UX
- Tool Collapse (#713) - Collapsible server groups in Tools section with state persistence
- Mic Button (#732) - Mic clickable during agent processing with message queuing
📊 Stats
- 40+ PRs merged since v1.1.0
- ~3,000+ lines refactored - Major code quality improvements
- AI SDK migration - Simplified LLM integration
- E2E testing infrastructure - New Playwright test suite
- Improved mobile experience - Settings management, conversation recovery, Android branding
🔄 Migration Notes
- No breaking changes - All existing functionality preserved
- Automatic migration - Settings and data migrate seamlessly
- New features opt-in - All new features work with existing configurations
- Backward compatible - Existing API endpoints and data structures unchanged
📥 Downloads
Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS
macOS Builds
- DMG:
SpeakMCP-1.2.0-arm64.dmg|SpeakMCP-1.2.0-x64.dmg - PKG:
SpeakMCP-1.2.0-arm64.pkg|SpeakMCP-1.2.0-x64.pkg - ZIP:
SpeakMCP-1.2.0-arm64.zip|SpeakMCP-1.2.0-x64.zip
🙏 Acknowledgments
Thanks to all contributors and users who provided feedback!
Full Changelog: v1.1.0...v1.2.0
Jan 1st update:
New Features:
• Reset Layout Button - Restores
agent tiles to default dimensions
in grid view
• Profile Import/Export (Mobile) -
Share API export + JSON paste
import
• Profile Import/Export (Desktop) -
Added missing Import/Export
buttons
• GitHub Actions - New workflow for
Linux/Windows builds
Bug Fixes:
• Agent premature completion fix
(LLM verifier always runs when
enabled)
• Text input hotkey fix (Ctrl+T no
longer triggers voice timer)
• Blank hover panel fix for
completed snoozed sessions
• Text input panel resize after
waveform recording
• Infinite refetch loop prevention
on mobile
• Verifier prompt bias fix
• Safe error message checking
• Profile import error handling
separation
License: AGPL-3.0
SpeakMCP v1.0.0
SpeakMCP v1.0.0 - Initial Release 🎉
The first official release of SpeakMCP - an AI-powered dictation tool with MCP (Model Context Protocol) integration.
Features
- 🎤 Voice Dictation - Hold Ctrl to record and transcribe your voice
- 🤖 AI-Powered - Integrates with OpenAI, Anthropic, Google, and more
- 🔧 MCP Tools - Connect to Model Context Protocol servers for extended functionality
- 📱 Cross-Platform - Available for macOS (Intel & Apple Silicon) and Android
- 🎯 Agent Mode - Multi-step AI agent with tool calling capabilities
- 🔒 Privacy-First - All processing happens locally
Downloads
Windows
There is now a .exe setup and .exe portable build for windows!
macOS
- Apple Silicon (M1/M2/M3):
SpeakMCP-1.0.0-arm64.dmg - Intel:
SpeakMCP-1.0.0-x64.dmg - ZIP and PKG installers also available
Android
- APK:
SpeakMCP-1.0.0-android.apk
Installation
macOS
- Download the appropriate DMG for your Mac
- Open the DMG and drag SpeakMCP to Applications
- On first launch, you may need to right-click and select "Open" due to Gatekeeper
- Grant accessibility and microphone permissions when prompted
Android
- Download the APK file
- Enable "Install from unknown sources" in your device settings
- Install the APK
- Grant microphone permissions when prompted
Requirements
- macOS: 12.0 (Monterey) or later
- Android: Android 7.0 (API level 24) or later
- API key for at least one AI provider (OpenAI, Anthropic, etc.)
Getting Started
- Launch SpeakMCP
- Enter your AI provider API key in Settings
- Hold Ctrl to start recording
- Release Ctrl to transcribe and get AI response
- Optional: Configure MCP servers for extended capabilities
Thank you for trying SpeakMCP! Please report any issues on GitHub.
v1.0.0 - Mobile App & Parallel Agent Sessions
🚀 SpeakMCP 1.0 is here!
This release marks a major milestone with two flagship features: the new Mobile App for voice-controlled AI on the go, and Parallel Agent Sessions for running multiple AI agents simultaneously.
⚠️ Platform Support
Windows and Linux do not currently support MCP tools in this release.
For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.
📱 Mobile App
Control your AI agents from anywhere! The SpeakMCP mobile app connects to your desktop via the Remote Server API.
Features
- Voice input - Speak commands to your AI agent from your phone
- Real-time progress - Watch your agent work in real-time
- Conversation continuity (#401) - Continue conversations seamlessly between mobile and desktop
- Emergency kill switch (#398) - Stop all agents instantly with
/v1/emergency-stopendpoint - QR code setup - Scan a QR code to configure the mobile app instantly
Monorepo Architecture (#417)
- Converted to pnpm workspace structure
- Mobile app lives at
apps/mobile/ - Desktop app at
apps/desktop/ - Shared design tokens package (
@speakmcp/shared) for consistent styling - Unified development workflow:
pnpm devfor desktop,pnpm dev:mobilefor mobile
Cloudflare Tunnel Integration (#363)
- One-click internet exposure - No port forwarding or network config needed
- Cloudflare Quick Tunnel (no account required)
- QR code generation with
speakmcp://deep links - Instant mobile app configuration by scanning
🔀 Parallel Agent Sessions
Run multiple AI agents at the same time! The new tiling session dashboard lets you manage multiple concurrent agent sessions.
Tiling Session Dashboard (#359)
- Sessions dashboard is now the landing page
- Responsive tiling layout (1=full width, 2=50/50, 3+=responsive grid)
- Each tile shows full conversation with internal scroll
- New sessions animate into the grid
- Tile size persistence (#410) - Tiles remember their size
- Default tile width optimized (#407) - Two tiles fit side-by-side
Unified Sessions & History (#429)
- History tab merged into Sessions page
- Past sessions in collapsible section
- Click any past session to open as new tile
- Search and filter past sessions
- Lazy loading with "Load More"
Multi-Session Controls
- Panel hide button (#405) - Minimize entire floating panel when multiple sessions active
- Sidebar session navigation (#438) - Click sidebar session to scroll to its tile
- Continue from tiles (#408, #381) - Follow-up stays in same tile, no new window spawns
- Active agents height limit (#406) - Prevents overlap with macOS window controls
Streaming Output (#388)
- See LLM responses as they're generated in real-time
- Live streaming display with animated cursor
- Auto-scroll follows streaming content
🎯 Other Major Features
Built-in Settings Tools for Self-Configuration (#386)
The agent can now configure itself! 5 new built-in tools:
list_mcp_servers- View all configured MCP servers with statustoggle_mcp_server- Enable/disable MCP servers by namelist_profiles- View all profiles and which is activeswitch_profile- Switch between profiles by ID or nameget_current_profile- Get current profile with full guidelines
Per-Profile MCP Server Configurations (#394)
- Each profile stores its own MCP server settings
- Automatically apply MCP config when switching profiles
- Different profiles can have different enabled servers/tools
Editable Base System Prompt (#431)
- Edit the base system prompt in Agent Settings
- Per-profile system prompt storage
- One-click restore to default prompt
Direct Response Support (#437)
- Agent answers simple questions without forcing tool calls
- Reduced latency for Q&A interactions
- Smarter detection of when tools are actually needed
System Prompt Optimization (#432)
- ~50% token reduction in system prompts
- Removed dead code and redundant instructions
- Cleaner, more focused prompts
🔧 UI/UX Improvements
Text/Voice Follow-up Inputs (#383, #376)
- New input component in floating agent progress overlay
- Text input field + Submit button + Voice button
- Continue conversations via text OR voice
Profile Management Redesign (#425, #426)
- Profile dropdown in sidebar with full management
- Create, edit, delete, import/export from dropdown
- Create new profile directly from dropdown
Always-On Agent Mode (#428)
- MCP tools and agent mode now always enabled
- Removed unnecessary toggle switches
- Safety settings moved to General Settings
Tool Calling UX (#337)
- Space to approve, Escape to deny tool calls
- Better parameter previews
- Hotkey hints on buttons
Other UI Fixes
- Rate limit retry banner with countdown (#369)
- Copy button for agent responses (#333)
- Settings reorganization (#348, #362)
- Submit hint shows 'Enter' for mic button (#396)
🐛 Bug Fixes
Agent & Session
- Panel now focusable when agent completes (#435)
- Text input responsive after agent finishes (#422)
- Killswitch properly closes panel during MCP init (#340)
- Final output summary expanded by default (#341)
- Floating GUI shows on voice keybind (#375)
- Scrollbars hidden until hover (#382)
- Maximize button no longer creates blank panel (#372)
Profile & Settings
- Profile text no longer truncated after save (#427)
- Panel doesn't appear when continuing from tiles (#413)
LLM & Provider
- Model preferences persist across sessions (#364)
- Empty content with toolCalls accepted (#346)
- Verifier JSON schema fixes (#347)
- Cerebras API compatibility (#352)
Logging
Development
🧹 Code Quality
Testing
- 56 new key-utils tests (#421)
- Total tests: 32 → 88 (+175%)
Cleanup
- Removed ~941 lines of unused code (#329)
- Debug logging cleanup (#330)
- DEBUGGING.md distilled from 425 to 48 lines (#374)
📊 Stats
- 50+ PRs merged since v0.3.0
- 100+ commits
- 88 tests (up from 32)
- Version: 0.3.0 → 1.0.0
🙏 Acknowledgments
Thanks to all users who provided feedback!
Key Pull Requests:
- #438 - Sidebar session navigation
- #437 - Direct response support
- #435 - Panel focusability fix
- #432 - System prompt optimization
- #431 - Editable system prompt
- #429 - Unified sessions & history
- #428 - Always-on agent mode
- #427 - Profile sync fix
- #425 - Profile dropdown management
- #421 - Key-utils tests
- #417 - pnpm monorepo with mobile
- #410 - Tile size persistence
- #408 - Continue in same tile
- #405 - Panel hide button
- #401 - Mobile conversation continuity
- #398 - Emergency stop endpoint
- #397 - Final summary fixes
- #394 - Per-profile MCP configs
- #388 - Streaming output
- #386 - Built-in settings tools
- #383 - Follow-up inputs
- #363 - Cloudflare tunnel
- #359 - Tiling session dashboard
Full Changelog: v0.3.0...v1.0.0
Released: December 2025 | License: AGPL-3.0
v0.3.0 - Multi-Session Agent Support
⚠️ Platform Support
Windows and Linux do not currently support MCP tools in this release.
For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.
🎯 Major Features
Multi-Session Agent Support (#264)
- Run multiple agent sessions concurrently with independent progress tracking
- Snooze/minimize sessions to background and restore from Active Agents sidebar
- Per-session killswitch - stopping one session doesn't affect others
- Eliminated "Initializing..." delay for new sessions
- Fixed state machine violations preventing generic "Processing..." states
🔧 Improvements
Session Management (#239)
- Voice input now defaults to creating new sessions for simplified workflow
- Configurable via
alwaysCreateNewSessionForVoicesetting - Text input continues to support conversation continuation
UI/UX Enhancements
- Voice input waveform now visible when agent is running (#260)
- Text input automatically receives focus when spawned via keybind (Ctrl+T) (#258)
- Improved visibility of MCP transport type dropdown (#257)
- Assistant message and thinking block expansion state now persists when new messages arrive (#274)
- Each
<think>section now has unique ARIA IDs for accessibility
Tool Management
- Tools from deleted MCP servers now properly removed from UI (#114)
- Tool call requests appear immediately in agent progress before responses (#202)
- Added Playwright MCP server as example for browser automation (#287)
Context Management (#304)
- Intelligent tool response processing to prevent context overflow
- Server-aware summarization (different strategies for Playwright, Desktop Commander, GitHub)
- Configurable thresholds (20KB/50KB defaults)
- Real-time progress feedback during large response processing
- UI now shows "Summarizing context (1/4)" during long operations instead of appearing frozen
Performance
- Replaced polling with push-based events for Active Agents Sidebar (#298)
- Immediate UI updates, reduced log spam, lower CPU usage
🐛 Bug Fixes
OAuth & Deep Links
- Fixed OAuth deep link callbacks on Windows and Linux by registering
speakmcp://protocol (#259, #225)
Agent State Management
- Kill switch now properly resets all agent state variables (#241)
- Old agent messages no longer appear when starting new sessions
- Session context no longer leaks between sessions (#294)
- Voice input no longer loads conversation history from previous sessions (#299)
- Completed sessions no longer block voice dictation (#301, #303)
- Waveform no longer shows unexpectedly after agent finishes (#292)
Display & Formatting
- Tool call expansion shows complete details instead of basic summaries (#142)
- TTS button state synchronization fixed (#140)
- Post-processing transcript auto-appends when
{transcript}placeholder missing (#93) - Waveform positioning fixed to span full width while centered (#109)
- Hide/Show buttons now work correctly in tool execution view (#289)
- Session tabs now display correct titles (#288)
- Progress UI shows immediately after voice input submission (#288)
LLM Handling
- Graceful fallback on empty LLM responses (#156)
- Parse non-standard
reasoningfield from providers like OpenRouter - Fixed OpenAI-compatible provider model discovery
- Verifier no longer causes infinite loops on impossible tasks (#304)
- Panel UI crashes fixed with null safety guards (#304)
🔧 New Configuration Options
// Tool response processing (src/main/config.ts)
mcpToolResponseProcessingEnabled: true
mcpToolResponseLargeThreshold: 20000 // 20KB
mcpToolResponseCriticalThreshold: 50000 // 50KB
mcpToolResponseChunkSize: 15000
mcpToolResponseProgressUpdates: true🧹 Code Quality
Refactoring
- Conversations section renamed to History for better semantic clarity (#149)
- Removed unnecessary "Resume auto-scroll" indicator (#158)
- Removed 'done', 'esc', and 'details' buttons from agent progress (#146)
- Added comprehensive tool calls test suite (#256)
- Improved TypeScript type safety with type-only imports
- Added comprehensive debug logging throughout
🔒 Security & Compatibility
- All changes maintain backward compatibility
- No breaking changes to existing functionality
- Enhanced error handling throughout
📊 Stats
- 60+ commits since v0.2.3
- 25+ PRs merged
🙏 Acknowledgments
Thanks to all users who provided feedback!
Key Pull Requests:
- #264 - Multi-session agent support
- #304 - Context overflow prevention and UI stability
- #298 - Push-based events for sidebar
- #294 - Session context isolation
- #292 - Recording cleanup fixes
- #289 - Tool execution UI fixes
- #288 - Session tabs and progress UI
- #287 - Playwright MCP example
- #274 - Expansion state persistence
- #260 - Waveform visibility fix
- #259 - OAuth deep links for Windows/Linux
- #258 - Text input focus fix
- #257 - MCP dropdown visibility
- #256 - Tool calls test suite
Full Changelog: v0.2.3...v0.3.0
Released: November 2025 | License: AGPL-3.0
v0.2.3
SpeakMCP v0.2.3
🎉 What's New
🤖 LLM & Model Improvements
Auto-Detection of Model Capabilities (#229)
- Self-Learning System: Automatically detects which models support structured output (JSON Schema/Object)
- Runtime Cache: Learns from actual usage and caches capabilities for 24 hours
- No Configuration Needed: Works with any new model automatically without hardcoding
- Fixes Infinite Retry Loops: Resolves issues with models like
google/gemini-2.5-flashgetting stuck - Enhanced Debug Logging: 11 new debug points throughout LLM request/response flow (enable with
DEBUG_LLM=1)
Enhanced Error Handling (#221)
- Novita AI Support: Fixed generic "model inference" errors from Novita and similar providers
- Improved Fallback Chain: Better detection of structured output errors triggers proper fallback
- Conversation Loading Fix: Resolved TypeError when loading conversations with tool results
Cloudflare 524 Timeout Handling
- Improved Retry Logic: 524 timeout errors now properly treated as retryable
- Better Error Detection: Enhanced detection for gateway and Cloudflare-specific errors
- User-Friendly Messages: Clear console output for retry progress
🛠️ MCP (Model Context Protocol) Enhancements
MCP Initialization Progress Feedback (#224)
- Real-Time Updates: Shows which server is being initialized with progress count
- Visual Feedback: Clear "Initializing MCP tools" message with server names
- Polling Updates: UI updates every 500ms during initialization
- Seamless Transition: Automatically proceeds to agent mode when ready
- Fixes #218
MCP Server Configuration UX (#ddcc8aa)
- Simplified Command Input: Single field for full command (e.g.,
npx -y @server/name) - Auto-Connect: Newly added servers automatically connect with status notifications
- Fixed Form Persistence: Resolved bug where old values persisted when switching modes
- Better Shell Parsing: Handles quoted paths and spaces in commands correctly
MCP Server Logging
- Capture Diagnostic Logs: View server output directly in UI
- Collapsible Log Viewer: Terminal-style display on MCP config cards
- Circular Buffer: In-memory log storage with clear functionality
- Clean UI: No [stderr] labels, just timestamp + message
🎯 Agent Mode Improvements
Stuck Loading State Fixes (#222, #223)
- Visible Killswitch Button: Always available when processing with confirmation dialog
- Enhanced Keyboard Shortcuts: Work regardless of internal state flags
- Mutation State Reset: Properly cleans up React Query mutations on emergency stop
- Comprehensive Cleanup: Resets all processing states, ends conversations, stops TTS
- Fixes #216
Tool Call Display Enhancements
- Immediate Tool Call Display (#202): Tool requests appear before responses, not after
- Stable Expansion State (#209): Tool calls remain expanded when new messages arrive
- Content-Based IDs: Stable hashing prevents expansion state loss
- Full Tool Details (#142): Complete parameters and results with JSON formatting
- Pending State Indicators (#71fa447): Clear "Pending..." badge while waiting for responses
- Fixes #196, #201
Conversation History Improvements (#210)
- Complete History Saved: All messages, tool calls, and results preserved
- Accurate Timestamps: Original message timestamps maintained
- Proper Ordering: Message sequence correctly preserved
- Fixes #195
Empty Response Handling (#52a92c1)
- Graceful Fallback: Handles null/empty LLM responses without crashing
- Case-Insensitive Detection: Better error pattern matching
- Continued Execution: Logs errors and continues instead of crashing
- Fixes #172
🎨 UI/UX Enhancements
Profile Management System (#3760558)
- Save/Load Profiles: Create named profiles with custom guideline configurations
- 3 Default Profiles: Default, Git & Version Control, AI Coding Agent
- Import/Export: Share profiles between installations
- Persistent Storage: Profiles saved and restored across sessions
- Fixes #199
Waveform & Panel Improvements (#205)
- Size Matching: Panel automatically sizes to accommodate full waveform (70 bars)
- Persistent Resize: User resize preferences saved per mode (normal/agent/textInput)
- Minimum Width: Enforced ~172px minimum to prevent waveform cutoff
- Aerospace/niri Support: Proper floating behavior in tiling window managers
- Fixes #203, #186
Settings & Navigation
- Settings Menu Restored (#ffbde3c): Added back to tray menu without non-functional keybind
- History Section (#149): Renamed "Conversations" to "History" for better clarity
- Model Selector Focus Fix (#192): Prevents focus loss while typing
- Fixes #194, #197
🔊 TTS (Text-to-Speech) Improvements
Enhanced Kill Switch (#193, #188)
- Global TTS Manager: Centralized control for all audio elements
- Emergency Stop Integration: Stops TTS on kill switch and ESC key
- Auto-Play Prevention: Blocks TTS after agent termination
- Stop TTS Button: Visible control in settings sidebar
- Playing State Tracking: Button only shows when TTS is actively playing
CORS Support for Remote Server (#881512f)
- Configurable Origins: Default to
*for development - Preflight Handling: Skip auth for OPTIONS requests
- UI Controls: Manage CORS settings in remote server configuration
🐛 Bug Fixes
- Tool Execution IDs (#eb00cae): Handle undefined arguments without crashing
- MCP Server Cleanup (#0606d70): Emergency stop no longer kills persistent MCP servers
- Quoted Paths (#daaea0b): Proper shell-like parsing for commands with spaces
- Linux Desktop Integration (#bc11c72): Fixed app menu appearance, icons, and PATH symlink
- Linux Startup Notification (#9b7c554): Disabled distracting "SpeakMCP is ready" popup
- macOS Floating Panel (#8d6f72f): Proper z-order and focusability for Aerospace compatibility
🔧 Technical Improvements
Developer Experience
- Node Version Pinning (#306dc17): Added
.nvmrcspecifying Node v20.19.5 - Worktree Setup Script (#c69e2b7): Fast worktree setup (~30sec vs ~3min)
- UI Debug Mode (#192): Track focus, renders, and state changes with
DEBUG_UI=1 - Enhanced Logging: Comprehensive debug output throughout codebase
Code Quality
- Optional Chaining (#ec71fc5): Replaced non-null assertions for better resilience
- CodeRabbit Suggestions: Addressed review feedback across multiple PRs
- TypeScript Fixes: Resolved type errors and improved type safety
- Defensive Programming: Added guards and fallbacks throughout
📦 Downloads
Cross-Platform Support: macOS (Apple Silicon & Intel)
macOS Builds
- DMG:
SpeakMCP-0.2.3-arm64.dmg(102M) |SpeakMCP-0.2.3-x64.dmg(109M) - PKG:
SpeakMCP-0.2.3-arm64.pkg(101M) |SpeakMCP-0.2.3-x64.pkg(109M) - ZIP:
SpeakMCP-0.2.3-arm64.zip(100M) |SpeakMCP-0.2.3-x64.zip(108M)
🔄 Migration Notes
- No breaking changes - All existing functionality preserved
- Automatic migration - Settings and data migrate seamlessly
- New features opt-in - All new features work with existing configurations
- Backward compatible - Existing API endpoints and data structures unchanged
📝 Technical Details
- 50+ commits since v0.2.2
- Version: 0.2.2 → 0.2.3
- Rust crate: Updated to 0.2.3
- Node.js: Pinned to v20.19.5
🙏 Acknowledgments
Thanks to all contributors and users who provided feedback!
Key Pull Requests:
- #229 - Auto-detection for model capabilities
- #224 - MCP initialization progress feedback
- #222 - Fix stuck loading state in agent mode
- #221 - Improve error handling for providers
- #210 - Save complete conversation history
- #205 - Waveform size matching and persistence
- #193 - Enhanced TTS kill switch
- #192 - Model selector focus fix
Issues Closed:
- #218 - MCP initialization feedback
- #216 - Stuck loading state
- #203 - Waveform size issues
- #199 - Profile management
- #197 - Settings menu restoration
- #196 - Tool call expansion state
- #195 - Conversation history
- #194 - Settings menu removal
- #188 - TTS kill switch
- #186 - Waveform rendering
- #172 - Empty response handling
Full Changelog: v0.2.2...v0.2.3
Released: November 2025 | License: AGPL-3.0
v0.2.2 - Stable Windows & Linux Builds
SpeakMCP v0.2.2
🎉 What's New
🌐 Remote Server API (Phase 1)
Transform SpeakMCP into an API-accessible AI agent service!
- OpenAI-Compatible HTTP Server with
/v1/chat/completionsand/v1/modelsendpoints - Secure Bearer Token Authentication with auto-generated API keys
- Full Agent Mode Support - Run MCP tools via HTTP API
- Flexible Configuration - Configurable port, bind address, and logging
- New Settings UI - Manage server, API keys, and view usage instructions
Example Usage:
curl -X POST http://127.0.0.1:3210/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"List my files"}]}'🪟 Windows Build Improvements
- ✅ Fixed Fastify module loading errors
- ✅ Custom application icon for Windows
- ✅ Improved native dependency handling
- ✅ Production-ready Windows builds
🎯 Agent Control Enhancements
- Visual Kill Switch - Stop individual agents with red X button in progress window
- Confirmation Dialog - Prevents accidental termination
- Comprehensive Cleanup - Aborts LLM requests, stops MCP servers, kills processes
- Visual Feedback - Clear "Stopped" status with terminated badge
🧪 Testing Infrastructure
- VNC GUI Testing for GitHub Actions
- Automated GUI testing in CI/CD
- Comprehensive testing documentation
🔧 Improvements
- MCP Tool Counter - Display total count of enabled tools (#182)
- Remote Server Settings - Improved UI with better layout
- macOS Build Fixes - Resolved codesign and architecture issues (#181)
- Better Error Handling - Improved stability across platforms
📦 Downloads
Cross-Platform Support: macOS (Apple Silicon & Intel), Windows (x64), Linux (x64)
🔄 Migration Notes
- No breaking changes - All existing functionality preserved
- Remote server disabled by default - Enable in Settings → Remote Server
- Automatic migration - Settings and data migrate seamlessly
📝 Technical Details
- 39 files changed: +6,157 additions, -1,021 deletions
- New dependencies: Fastify for HTTP server
- Version: 0.2.1 → 0.2.2
🐛 Bug Fixes
- Fixed Windows Fastify module loading
- Resolved macOS codesign timestamp issues
- Fixed architecture compatibility on macOS
- Improved native extension handling
📚 Documentation
- Remote Server Guide:
docs/remote-server-phase-1.md - VNC Testing:
.github/VNC_QUICK_START.md - Updated README with new features
🙏 Acknowledgments
Thanks to all contributors and users who provided feedback!
Pull Requests:
- #176 - Visual kill switch for agent progress windows
- #166 - Remote Server Phase 1: OpenAI-compatible HTTP API
Issues Closed:
- #182 - Add total tools enabled counter
- #181 - Build launch architecture error
- #175 - Visual kill switch request
Full Changelog: v0.2.1...v0.2.2
Released: October 2025 | License: AGPL-3.0
v0.2.1
Full Changelog: v0.2.0...v0.2.1