Skip to content

Releases: aj47/SpeakMCP

SpeakMCP v1.4.0

20 Feb 21:09

Choose a tag to compare

SpeakMCP v1.4.0

New Features

  • Mobile Settings Parity: Added settings for skills, memories, personas, and loops on mobile
  • Supertonic TTS: Added Supertonic as a new local text-to-speech option
  • Past Sessions Modal: Moved past sessions from sidebar to a modal with delete support
  • Agent Spoken Output: Added explicit respond_to_user spoken-output flow for multi-channel support
  • Skills in Prompts: Enabled skills appear in predefined prompts dropdown menu
  • Ephemeral Messages: Hide internal completion nudge via ephemeral message system
  • Continue Conversation Shortcuts: Added Shift+hotkey keybinds to continue last conversation
  • Unlimited Agent Runs: Added option to disable max iteration limit for unlimited agent loops
  • Transcription Preview: Added opt-in live transcription preview during recording
  • Mobile Session Sync: Sync sessions between mobile and desktop with lazy loading

Bug Fixes

  • Fixed React deduplication and streaming repetition bugs on mobile
  • Fixed duplicate assistant messages and preserved mobile progress on empty history
  • Fixed TTS playback issues: stale generation results, double playback on remount, cleanup timing
  • Fixed streaming state getting stuck in "generating" when throttle drops events
  • Fixed sidebar collapse button spacing on macOS
  • Fixed search icon alignment and standardized icon-text spacing across UI
  • Fixed dialog grid children width overflow
  • Fixed input drafts sync from overwriting user typing
  • Fixed mobile transcription: word-boundary matching, StrictMode compat, focus timeouts
  • Fixed hold-mode race conditions in continue-conversation shortcuts
  • Fixed WebM to float32 PCM decoding for Parakeet STT
  • Fixed network retry delays made interruptible by kill switch
  • Bundled sherpa-onnx native packages properly in packaged app
  • Gated SpeakMCP-specific fetches behind isSpeakMCPServer check

Improvements

  • Consolidated duplicated summarization settings UI
  • Session tiles now fill available vertical space
  • Final assistant messages expand by default in tile/sessions view
  • Grid view click now resets tile layout (removed separate reset button)
  • Per-message TTS on mobile

Downloads

  • macOS (Apple Silicon): SpeakMCP-1.4.0-arm64.dmg - Signed and notarized
  • Android: SpeakMCP-1.4.0.apk

v1.3.0

17 Feb 04:01

Choose a tag to compare

SpeakMCP v1.3.0

🚀 New Features

  • Agent Loops System — Schedule agents to run with specific prompts at regular intervals (#1036)
  • Terminal QR Code — QR code rendering for mobile pairing in headless/SSH/terminal environments (#1025)
  • Collapsible Queued Messages — Message queue panel is now collapsible and height-limited (#1042)
  • Rapid Fire Voice (Mobile) — Hold-to-speak rapid fire voice input with large mic button (#1024)
  • Conversation Keybinds — Shift+hotkey shortcuts for conversation navigation (#1021)
  • Supertonic TTS — New TTS provider support (#1000)
  • Disable Max Iteration Limit — Option to remove agent iteration cap (#1017)
  • Google Assistant Integration (Mobile) — Trigger SpeakMCP via "Hey Google" App Actions (#1022)
  • File-Based Dynamic Context Discovery — Efficient token usage via file-synced MCP tools, profiles, and skills (#897)
  • Debug Logging — Comprehensive debug logging feature with file management and UI (#178)
  • Persistent Cloudflare Tunnel URL — Store and display last known tunnel URL for reconnection (#723)
  • Bundled Electron MCP Skill — Bootstrap skill for local Electron MCP server (#1056)
  • Standalone Server Package — Central server architecture for multi-client support (#790, #791)

🐛 Bug Fixes

  • Desktop Kill Switch — Fixed kill switch not stopping agent sessions; now session-aware with interruptible retry delays (#1058, #255, #1023)
  • Mobile Session Sync — Fixed mobile sessions failing to load messages on desktop (#1059)
  • Mobile Session Polling — Added foreground session polling so new desktop sessions are detected (#1055)
  • Follow-up Message Display — Messages now appear immediately after session stop (#1057)
  • Claude Models via OpenRouter — Prevent assistant message prefill error (#1037)
  • Mobile Streaming — Fixed double words in streaming responses (#1028)
  • Model Search Sticky — Search input stays sticky in settings dropdown (#1041)
  • Waveform Height — Increased waveform visualization height, prevent excessive shrink (#1052)
  • Settings Sidebar Scroll — Settings panel now scrolls with sidebar (#1051)
  • Mobile Rapid Fire UX — Improved session visibility and voice feedback (#1038)
  • Verification Corruption — Prevent verification from corrupting final agent response (#1050)
  • Renderer Crash Recovery — Auto-recover from GPU/renderer process crashes (#810)
  • Mobile Network Failures — Graceful error handling with retry for app backgrounding (#489)
  • Hardened Runtime — Enabled by default for macOS permission persistence (#847)

⚡ Performance

  • Async Index + Throttled Progress — Async+debounced conversation index writes & throttled progress emits for dramatically reduced main process blocking (#1060)

🎨 UI Improvements

  • Sessions Sort by Last Modified (#1016)
  • Session Tiles Fill Vertical Space (#1018)
  • Search Bar Model Dropdown Fix (#1015)
  • Groq API Pricing Research (#1009)
  • Improved Agent Documentation (#1010)

📦 Downloads

Platform File
macOS (Apple Silicon) SpeakMCP-1.3.0-arm64.dmg
macOS (Intel) SpeakMCP-1.3.0-x64.dmg
Android SpeakMCP-1.3.0.apk

Note: macOS DMGs are code-signed and notarized by Apple for safe installation.

v1.2.0

08 Jan 01:19
13c3ed1

Choose a tag to compare

🎯 Major Features

🎭 Agent Personas & Multi-Agent Delegation (#920)

  • Agent Personas - Create specialized AI personas with custom system prompts, tools, and skills
  • Delegation System - Route tasks to specialized sub-agents based on expertise
  • Agent Profile Management - Full CRUD for managing delegation targets
  • Internal & External Agents - Support for both built-in personas and external agents
  • Settings UI - Configure at Settings → Agent Personas

🤖 External ACP Agents (#894, #920)

  • ACP Protocol Support - Connect to external AI agents via Agent Client Protocol
  • Claude Code Integration - Delegate coding tasks to Claude Code
  • Auggie Support - Connect to Augment's Auggie agent
  • Multiple Transport Types:
    • stdio - Spawn local agent processes
    • remote - Connect to HTTP endpoints
    • internal - Built-in delegation within SpeakMCP
  • Delegation Tools:
    • list_available_agents - Discover available specialized agents
    • delegate_to_agent - Route tasks to specific agents
    • get_delegation_status - Check on delegated task progress

🧠 Dual-Model Agent Mode (#919)

  • Strong Model for Planning - Use powerful models for complex reasoning
  • Weak Model for Summarization - Use faster/cheaper models for UI summaries
  • Agent Summary View - See compact summaries of agent progress
  • Memory System - Save important findings to persistent memory files
  • Settings UI - Configure at Settings → Providers & Models

💾 Agent Memory System (#919, #963, #975)

  • Persistent Memories - Save key information across sessions
  • Ultra-Compact Format - Single-line memories for efficiency
  • Agent Memory Tools:
    • list_memories - View saved memories
    • save_memory - Create new memories
    • delete_memory - Remove memories
    • delete_multiple_memories - Bulk delete
    • delete_all_memories - Clear all memories
  • Bulk Delete UI - Select and delete multiple memories in Settings

🧠 Agent Skills System (#895, #958)

  • Skills Service - Modular skills system for enhancing AI capabilities
  • Per-Profile Skills - Each profile can have its own set of skills
  • Import from GitHub - Import skills directly from GitHub repositories
  • Import from Local Folders - Add skills from local directories
  • Progressive Loading - Skills load on-demand to reduce token usage
  • Auto-Refresh - New skills detected automatically without restart
  • Proactive Context - Skills injected into system prompt automatically (#942)
  • Bundled Skills - Includes "Agent Skill Creation" meta-skill

📊 Langfuse Observability Integration (#929, #941, #947)

  • LLM Call Tracing - All LLM calls traced with model, prompts, responses, and token usage
  • Agent Session Traces - Complete agent workflows tracked from start to finish
  • MCP Tool Call Spans - Each tool invocation logged with inputs/outputs
  • Sessions Support - Group traces by conversation for multi-turn debugging
  • Profile Tags - Filter traces by profile name in Langfuse dashboard
  • Optional Dependency - Install langfuse only when needed
  • Settings UI - Configure via Settings > General > Langfuse Observability

🔗 Persistent Cloudflare Tunnel URLs (#922, #954)

  • Named Tunnels - Persistent URLs that remain the same across restarts
  • Quick Tunnels - Existing random URL functionality preserved
  • Auto-Start on Launch - Tunnel can start automatically when app opens
  • Tunnel Mode Selector - Choose between Quick and Named tunnels in UI
  • Available Tunnels List - Shows existing tunnels when logged in

📱 Mobile & Cross-Device Features (#962, #972)

  • Chat Sync - Sync chat state between desktop and mobile app
  • Conversation Continuity - Continue conversations across devices
  • Compact Chat UI - Single-line collapsed view for mobile
  • Pull-to-Refresh - Sync with desktop in real-time

🔧 Inter-Agent Communication (#959)

  • send_agent_message - Send messages between running agent sessions
  • Agent Coordination - Enables collaborative multi-agent workflows
  • list_running_agents - Discover active agent sessions

📱 WhatsApp Harness Improvements (#910, #905)

  • Message Handling in MCP Server - Moved message logic from desktop harness to MCP server
  • Conversation ID Persistence - Tracks conversations across sessions
  • /new Command - Start fresh conversations with /new command
  • Automatic Typing Indicator - Shows typing when agent starts processing
  • Harness Output Mode - Configurable auto-response via harness layer
  • WhatsApp Toggle - Enable/disable in main settings UI (#934)

💾 Context Compaction & Memory Management (#908, #909)

  • Persistent Compaction - Older messages summarized and saved to disk
  • No More Re-summarization - Summaries persist across sessions
  • Conversation Compaction on Load - Summarizes older messages when exceeding 20 messages
  • MCP Process Cleanup - Properly terminates MCP server processes on app quit

🎮 Profile CRUD Tools (#938)

  • create_profile - Create new profiles programmatically
  • update_profile - Modify existing profiles
  • delete_profile - Remove profiles with safeguards
  • duplicate_profile - Copy profiles including all configurations

📐 Model Registry with Fuzzy Matching (#907)

  • ~100 Models Supported - Comprehensive registry for context window detection
  • Fuzzy Matching - Correctly identifies models through proxies (e.g., Claude via OpenAI-compatible endpoints)
  • Providers Covered - Anthropic, OpenAI, Google, xAI, DeepSeek, Mistral, Qwen, Llama

🛠️ MCP Tool Discovery (#948, #950)

  • Lazy Loading - Lightweight system prompt with on-demand tool details
  • list_server_tools - Get all tools from a specific MCP server
  • get_tool_schema - Get full JSON schema for any tool
  • Dynamic Tool Filtering - Reduce tool overhead from 64 to ~20 per call

🚀 UX & UI Improvements

Session Tiles

  • Expand to Full Window (#915) - New button to expand session tile to fill entire window
  • Clickable Title Area (#886) - Click anywhere on title to collapse/expand
  • Removed Max Height Constraint (#927) - Tiles can now fill available vertical space
  • Proper Sizing Transitions (#914) - Panel resizes correctly when switching from voice to agent mode

Floating Panel

  • Auto-Hide When Main Focused (#887) - Panel hides when main window has focus
  • Cleaner UI (#904) - Removed ESC hint and tool call progress indicator
  • Responsive Hotkey Hints (#898, #980) - Hints hide on narrow screens and during sessions
  • Drag Fix (#978) - Panel no longer closes during drag/button interactions

Sidebar & Navigation

  • Improved Sidebar UI (#930) - Added 'Past' label, simplified search placeholder
  • Past Sessions Cleanup (#891) - Removed redundant title text
  • Agent Mode Keybind Visibility (#892) - Aura keybind now visible on sessions page

Mobile App

  • Compact Tool Calls (#932, #972) - Tool calls collapse to single line
  • Session Title Fix (#931) - Correct title shows instead of 'Transcribing...'
  • Compact Chat UI (#972) - Single-line collapsed view matching desktop styling

Streamer Mode (#893)

  • Hide Sensitive Info - Masks phone numbers, API keys, QR codes, and URLs
  • Global Indicator - Shows in sidebar footer when active
  • Privacy During Streaming - Protect sensitive data when screen sharing

🔧 API & Settings Improvements

Feature Toggle API (#939)

  • verificationEnabled - Query/toggle verification setting
  • messageQueueEnabled - Query message queue setting
  • parallelToolExecutionEnabled - Query parallel execution setting
  • toggle_verification - New tool to enable/disable verification

New Builtin Tools

  • read_media_file - Read images/audio as base64 for multimodal LLM input
  • load_skill_instructions - Fetch full skill instructions on-demand

Settings Reorganization

  • Langfuse Moved (#949) - Settings moved from sidebar to General Settings page
  • Removed Auto-Configured Filesystem Server (#961) - Use execute_command for filesystem operations

🐛 Bug Fixes

Agent Loop & Completion (#946, #967, #970)

  • Fixed empty response retry loops with counter and limit
  • Fixed verification loops incrementing fail count in all paths
  • Fixed tool context loss during context shrinking
  • Simplified LLM integration trusting model's completion signals
  • Added tool timeout handling (20s default, 120s for browser automation)
  • Fixed parallel call race conditions

Verification Logic (#951)

  • Fixed direct verifier call bypassing safety features
  • Fixed duplicate messages in verification context
  • Fixed loop detection false positives on short responses
  • Fixed empty tool results handling
  • Unified all verification paths

Voice & Recording

  • Mic Button Fix (#906) - Voice recording now takes precedence over text input
  • Waveform Fix (#976) - Waveform now shows for normal voice dictation mode

Model & Preset

  • Summarization Model Sync (#977) - Summarization model updates when switching presets

Memory & Processes

  • MCP Process Cleanup (#909) - No more orphaned node processes (2GB+ memory leak fixed)

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS

macOS Builds (Signed)

  • DMG: SpeakMCP-1.2.0-arm64.dmg | SpeakMCP-1.2.0-x64.dmg

Android

  • APK: SpeakMCP-1.2.0.apk

Linux

  • AppImage: SpeakMCP-1.2.0.AppImage

📥 Download Latest Release

🔄 Migration Notes

  • No breaking changes - All existing functionality preserved
  • Automatic migration - Settings and data migrate seamlessly
  • New features opt-in - All new features work with existing configurations
  • Backward compatible - Existing API endpoints and data structures un...
Read more

v1.1.0

24 Dec 06:49
3c2bf37

Choose a tag to compare

🎯 Major Features

🤖 Vercel AI SDK Migration (#812)

  • Simplified LLM Code - Removed ~1,100 lines of custom HTTP/fetch logic
  • Provider Flexibility - Easy to add new providers (Anthropic, etc.)
  • Better Streaming - AI SDK handles streaming protocols natively
  • Type Safety - Better TypeScript support from AI SDK
  • Maintained Compatibility - Same public API, MCP tools work unchanged

📋 Kanban View for Sessions (#807)

  • Three-Column Layout - Idle, In Progress, and Done columns
  • Visual State Indicators - Clear status for each session
  • View Toggle - Switch between Grid and Kanban views
  • Session Organization - Better workflow management

📝 Predefined Prompts (#809)

  • Save Frequent Prompts - Quick access to commonly used prompts
  • One-Click Insert - Click to insert prompt into input field
  • Full Management - Add, edit, and delete saved prompts
  • Persistent Storage - Prompts saved across sessions

📱 Mobile Settings Management (#744)

  • Profile Switching - Switch between profiles directly from mobile app
  • MCP Server Management - View connection status and enable/disable servers remotely
  • Feature Toggles - Control post-processing, TTS, and tool approval from mobile
  • Pull-to-Refresh - Sync settings with desktop in real-time
  • New API endpoints: /v1/profiles, /v1/mcp/servers, /v1/settings

🔍 MCP Registry Integration (#785)

  • Official Registry Browser - Discover 100+ MCP servers from the official registry
  • One-Click Installation - Add servers with a single click
  • Smart Search - Find servers by name and description
  • Server Type Badges - npm, PyPI, Docker, Remote server indicators
  • 5-Minute Caching - Reduced API calls for better performance

📤 Enhanced Profile Export/Import (#772)

  • MCP Server Definitions - Export now includes all enabled MCP server configurations
  • Model Settings - Export includes model configuration settings
  • Smart Import - Merges MCP definitions without overwriting existing config
  • Easy Sharing - Share complete profiles with team members

🚀 Performance & UX Improvements

Recording Latency Reduction (#734)

  • 250ms faster - Reduced hold-to-record delay from 800ms → 250ms
  • Overlapped initialization - Start recording before showing panel UI
  • Snappier response - Faster feedback when holding Ctrl

Mobile Text Interaction (#735)

  • Expandable/Collapsible Text - Tap anywhere on collapsed text to expand
  • Selectable Content - Copy LLM responses and tool parameters
  • Better Tool Cards - Larger tap targets for expanding tool results
  • Visual Feedback - Pressed states for better UX

Session Management (#739, #740)

  • Always-Visible Start Buttons - Start new sessions anytime, even with active sessions
  • Queueable Voice Input - Record voice messages during agent processing
  • Message Queuing - Transcripts queue automatically when agent is busy

UI Polish (#733, #738, #800, #801, #806, #811)

  • Stop Sign Icon - Changed kill switch from X to OctagonX for clarity
  • Collapsed Servers - MCP servers collapsed by default for cleaner UI
  • Kill Switch in Follow-ups - Stop button now in follow-up input panels
  • Responsive MCP Modal (#800) - Tool details modal now scrollable and responsive
  • Improved Sidebar Layout (#801) - Settings above sessions, sessions scroll to bottom
  • Sessions Icon (#806) - Quick navigation icon in collapsed sidebar
  • Edit Profiles Shortcut (#811) - Direct link to profile settings in dropdown

Mobile Improvements (#794, #816)

  • Android Branding (#794) - Updated app name, icons, and speakmcp:// deep linking
  • Conversation Recovery (#816) - Recover server state on connection retry (no duplicates)

🔧 Code Quality & Maintenance

Major Refactoring Sprint (#775, #776, #777, #778, #779, #780)

  • keyboard.ts Modularization (#775) - Split 1,170 lines into focused modules
  • MCP Config Manager (#776) - React Context to eliminate prop drilling
  • tipc.ts Split (#777) - 66% reduction (2,913 → 974 lines) into 14 domain modules
  • MCP Service Refactor (#778) - Extracted into focused, maintainable modules
  • ChatScreen Components (#779) - Mobile app code organization with custom hooks
  • LLM Provider Abstraction (#780) - Clean provider interface for all LLM backends

LLM Code Consolidation (#781)

  • Removed structured-output.ts - Consolidated unused code (~285 LOC reduction)
  • Moved makeStructuredContextExtraction - Relocated to llm-fetch.ts

Shared Package Improvements (#773)

  • Type Consolidation - AgentProgressStep, AgentProgressUpdate moved to shared
  • New Utilities - formatDuration, formatTimestamp, statusColors
  • Custom Hooks - useCollapsibleState, useCollapsibleSet for UI state
  • Dependency Cleanup - Proper runtime vs dev dependency classification

Testing Infrastructure (#770)

  • E2E Tests - Playwright tests for Electron with custom fixtures
  • Smoke Tests - App launch and basic navigation tests
  • Settings Tests - Configuration and provider testing
  • MCP Tests - Server management and session tests
  • CI Integration - GitHub Actions workflow for automated testing

Refactoring Issue Templates (#745)

  • Created 9 detailed issue templates for future refactoring work
  • Includes proposals for tipc.ts, mcp-service.ts, keyboard.ts modularization

🐛 Bug Fixes

LLM & Provider

  • Empty Response Handling (#793, #797) - Fixed false "Network error" for valid empty completions
  • Provider Name Display (#737) - Show actual preset name (OpenRouter, Together AI) instead of generic "OpenAI"
  • Groq TTS Update (#784) - Updated to Orpheus models (PlayAI deprecated)

Mobile & Desktop

  • Disabled Server Tools (#743) - Hide tools from disabled MCP servers
  • JSX Nesting (#728) - Fixed parse errors in MCP config manager
  • Tunnel Persistence (#722) - Auto-reconnect mobile app on restart with stable device ID

UI/UX

  • Tool Collapse (#713) - Collapsible server groups in Tools section with state persistence
  • Mic Button (#732) - Mic clickable during agent processing with message queuing

📊 Stats

  • 40+ PRs merged since v1.1.0
  • ~3,000+ lines refactored - Major code quality improvements
  • AI SDK migration - Simplified LLM integration
  • E2E testing infrastructure - New Playwright test suite
  • Improved mobile experience - Settings management, conversation recovery, Android branding

🔄 Migration Notes

  • No breaking changes - All existing functionality preserved
  • Automatic migration - Settings and data migrate seamlessly
  • New features opt-in - All new features work with existing configurations
  • Backward compatible - Existing API endpoints and data structures unchanged

📥 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS

macOS Builds

  • DMG: SpeakMCP-1.2.0-arm64.dmg | SpeakMCP-1.2.0-x64.dmg
  • PKG: SpeakMCP-1.2.0-arm64.pkg | SpeakMCP-1.2.0-x64.pkg
  • ZIP: SpeakMCP-1.2.0-arm64.zip | SpeakMCP-1.2.0-x64.zip

📥 Download Latest Release

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Full Changelog: v1.1.0...v1.2.0


Jan 1st update:
New Features:
• Reset Layout Button - Restores
agent tiles to default dimensions
in grid view
• Profile Import/Export (Mobile) -
Share API export + JSON paste
import
• Profile Import/Export (Desktop) -
Added missing Import/Export
buttons
• GitHub Actions - New workflow for
Linux/Windows builds

Bug Fixes:
• Agent premature completion fix
(LLM verifier always runs when
enabled)
• Text input hotkey fix (Ctrl+T no
longer triggers voice timer)
• Blank hover panel fix for
completed snoozed sessions
• Text input panel resize after
waveform recording
• Infinite refetch loop prevention
on mobile
• Verifier prompt bias fix
• Safe error message checking
• Profile import error handling
separation

License: AGPL-3.0

SpeakMCP v1.0.0

08 Dec 03:35

Choose a tag to compare

SpeakMCP v1.0.0 - Initial Release 🎉

The first official release of SpeakMCP - an AI-powered dictation tool with MCP (Model Context Protocol) integration.

Features

  • 🎤 Voice Dictation - Hold Ctrl to record and transcribe your voice
  • 🤖 AI-Powered - Integrates with OpenAI, Anthropic, Google, and more
  • 🔧 MCP Tools - Connect to Model Context Protocol servers for extended functionality
  • 📱 Cross-Platform - Available for macOS (Intel & Apple Silicon) and Android
  • 🎯 Agent Mode - Multi-step AI agent with tool calling capabilities
  • 🔒 Privacy-First - All processing happens locally

Downloads

Windows

There is now a .exe setup and .exe portable build for windows!

macOS

  • Apple Silicon (M1/M2/M3): SpeakMCP-1.0.0-arm64.dmg
  • Intel: SpeakMCP-1.0.0-x64.dmg
  • ZIP and PKG installers also available

Android

  • APK: SpeakMCP-1.0.0-android.apk

Installation

macOS

  1. Download the appropriate DMG for your Mac
  2. Open the DMG and drag SpeakMCP to Applications
  3. On first launch, you may need to right-click and select "Open" due to Gatekeeper
  4. Grant accessibility and microphone permissions when prompted

Android

  1. Download the APK file
  2. Enable "Install from unknown sources" in your device settings
  3. Install the APK
  4. Grant microphone permissions when prompted

Requirements

  • macOS: 12.0 (Monterey) or later
  • Android: Android 7.0 (API level 24) or later
  • API key for at least one AI provider (OpenAI, Anthropic, etc.)

Getting Started

  1. Launch SpeakMCP
  2. Enter your AI provider API key in Settings
  3. Hold Ctrl to start recording
  4. Release Ctrl to transcribe and get AI response
  5. Optional: Configure MCP servers for extended capabilities

Thank you for trying SpeakMCP! Please report any issues on GitHub.

v1.0.0 - Mobile App & Parallel Agent Sessions

07 Dec 20:27
150b161

Choose a tag to compare

🚀 SpeakMCP 1.0 is here!

This release marks a major milestone with two flagship features: the new Mobile App for voice-controlled AI on the go, and Parallel Agent Sessions for running multiple AI agents simultaneously.


⚠️ Platform Support

Windows and Linux do not currently support MCP tools in this release.

For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.


📱 Mobile App

Control your AI agents from anywhere! The SpeakMCP mobile app connects to your desktop via the Remote Server API.

Features

  • Voice input - Speak commands to your AI agent from your phone
  • Real-time progress - Watch your agent work in real-time
  • Conversation continuity (#401) - Continue conversations seamlessly between mobile and desktop
  • Emergency kill switch (#398) - Stop all agents instantly with /v1/emergency-stop endpoint
  • QR code setup - Scan a QR code to configure the mobile app instantly

Monorepo Architecture (#417)

  • Converted to pnpm workspace structure
  • Mobile app lives at apps/mobile/
  • Desktop app at apps/desktop/
  • Shared design tokens package (@speakmcp/shared) for consistent styling
  • Unified development workflow: pnpm dev for desktop, pnpm dev:mobile for mobile

Cloudflare Tunnel Integration (#363)

  • One-click internet exposure - No port forwarding or network config needed
  • Cloudflare Quick Tunnel (no account required)
  • QR code generation with speakmcp:// deep links
  • Instant mobile app configuration by scanning

🔀 Parallel Agent Sessions

Run multiple AI agents at the same time! The new tiling session dashboard lets you manage multiple concurrent agent sessions.

Tiling Session Dashboard (#359)

  • Sessions dashboard is now the landing page
  • Responsive tiling layout (1=full width, 2=50/50, 3+=responsive grid)
  • Each tile shows full conversation with internal scroll
  • New sessions animate into the grid
  • Tile size persistence (#410) - Tiles remember their size
  • Default tile width optimized (#407) - Two tiles fit side-by-side

Unified Sessions & History (#429)

  • History tab merged into Sessions page
  • Past sessions in collapsible section
  • Click any past session to open as new tile
  • Search and filter past sessions
  • Lazy loading with "Load More"

Multi-Session Controls

  • Panel hide button (#405) - Minimize entire floating panel when multiple sessions active
  • Sidebar session navigation (#438) - Click sidebar session to scroll to its tile
  • Continue from tiles (#408, #381) - Follow-up stays in same tile, no new window spawns
  • Active agents height limit (#406) - Prevents overlap with macOS window controls

Streaming Output (#388)

  • See LLM responses as they're generated in real-time
  • Live streaming display with animated cursor
  • Auto-scroll follows streaming content

🎯 Other Major Features

Built-in Settings Tools for Self-Configuration (#386)

The agent can now configure itself! 5 new built-in tools:

  • list_mcp_servers - View all configured MCP servers with status
  • toggle_mcp_server - Enable/disable MCP servers by name
  • list_profiles - View all profiles and which is active
  • switch_profile - Switch between profiles by ID or name
  • get_current_profile - Get current profile with full guidelines

Per-Profile MCP Server Configurations (#394)

  • Each profile stores its own MCP server settings
  • Automatically apply MCP config when switching profiles
  • Different profiles can have different enabled servers/tools

Editable Base System Prompt (#431)

  • Edit the base system prompt in Agent Settings
  • Per-profile system prompt storage
  • One-click restore to default prompt

Direct Response Support (#437)

  • Agent answers simple questions without forcing tool calls
  • Reduced latency for Q&A interactions
  • Smarter detection of when tools are actually needed

System Prompt Optimization (#432)

  • ~50% token reduction in system prompts
  • Removed dead code and redundant instructions
  • Cleaner, more focused prompts

🔧 UI/UX Improvements

Text/Voice Follow-up Inputs (#383, #376)

  • New input component in floating agent progress overlay
  • Text input field + Submit button + Voice button
  • Continue conversations via text OR voice

Profile Management Redesign (#425, #426)

  • Profile dropdown in sidebar with full management
  • Create, edit, delete, import/export from dropdown
  • Create new profile directly from dropdown

Always-On Agent Mode (#428)

  • MCP tools and agent mode now always enabled
  • Removed unnecessary toggle switches
  • Safety settings moved to General Settings

Tool Calling UX (#337)

  • Space to approve, Escape to deny tool calls
  • Better parameter previews
  • Hotkey hints on buttons

Other UI Fixes

  • Rate limit retry banner with countdown (#369)
  • Copy button for agent responses (#333)
  • Settings reorganization (#348, #362)
  • Submit hint shows 'Enter' for mic button (#396)

🐛 Bug Fixes

Agent & Session

  • Panel now focusable when agent completes (#435)
  • Text input responsive after agent finishes (#422)
  • Killswitch properly closes panel during MCP init (#340)
  • Final output summary expanded by default (#341)
  • Floating GUI shows on voice keybind (#375)
  • Scrollbars hidden until hover (#382)
  • Maximize button no longer creates blank panel (#372)

Profile & Settings

  • Profile text no longer truncated after save (#427)
  • Panel doesn't appear when continuing from tiles (#413)

LLM & Provider

  • Model preferences persist across sessions (#364)
  • Empty content with toolCalls accepted (#346)
  • Verifier JSON schema fixes (#347)
  • Cerebras API compatibility (#352)

Logging

  • Stopped logging entire conversation history (#409, #412)

Development

  • Metro bundler pnpm support (#418)
  • Auto-build Rust binary before dev (#419)

🧹 Code Quality

Testing

  • 56 new key-utils tests (#421)
  • Total tests: 32 → 88 (+175%)

Cleanup

  • Removed ~941 lines of unused code (#329)
  • Debug logging cleanup (#330)
  • DEBUGGING.md distilled from 425 to 48 lines (#374)

📊 Stats

  • 50+ PRs merged since v0.3.0
  • 100+ commits
  • 88 tests (up from 32)
  • Version: 0.3.0 → 1.0.0

🙏 Acknowledgments

Thanks to all users who provided feedback!

Key Pull Requests:

  • #438 - Sidebar session navigation
  • #437 - Direct response support
  • #435 - Panel focusability fix
  • #432 - System prompt optimization
  • #431 - Editable system prompt
  • #429 - Unified sessions & history
  • #428 - Always-on agent mode
  • #427 - Profile sync fix
  • #425 - Profile dropdown management
  • #421 - Key-utils tests
  • #417 - pnpm monorepo with mobile
  • #410 - Tile size persistence
  • #408 - Continue in same tile
  • #405 - Panel hide button
  • #401 - Mobile conversation continuity
  • #398 - Emergency stop endpoint
  • #397 - Final summary fixes
  • #394 - Per-profile MCP configs
  • #388 - Streaming output
  • #386 - Built-in settings tools
  • #383 - Follow-up inputs
  • #363 - Cloudflare tunnel
  • #359 - Tiling session dashboard

Full Changelog: v0.3.0...v1.0.0


Released: December 2025 | License: AGPL-3.0

v0.3.0 - Multi-Session Agent Support

24 Nov 22:27
d3cad06

Choose a tag to compare

⚠️ Platform Support

Windows and Linux do not currently support MCP tools in this release.

For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.


🎯 Major Features

Multi-Session Agent Support (#264)

  • Run multiple agent sessions concurrently with independent progress tracking
  • Snooze/minimize sessions to background and restore from Active Agents sidebar
  • Per-session killswitch - stopping one session doesn't affect others
  • Eliminated "Initializing..." delay for new sessions
  • Fixed state machine violations preventing generic "Processing..." states

🔧 Improvements

Session Management (#239)

  • Voice input now defaults to creating new sessions for simplified workflow
  • Configurable via alwaysCreateNewSessionForVoice setting
  • Text input continues to support conversation continuation

UI/UX Enhancements

  • Voice input waveform now visible when agent is running (#260)
  • Text input automatically receives focus when spawned via keybind (Ctrl+T) (#258)
  • Improved visibility of MCP transport type dropdown (#257)
  • Assistant message and thinking block expansion state now persists when new messages arrive (#274)
  • Each <think> section now has unique ARIA IDs for accessibility

Tool Management

  • Tools from deleted MCP servers now properly removed from UI (#114)
  • Tool call requests appear immediately in agent progress before responses (#202)
  • Added Playwright MCP server as example for browser automation (#287)

Context Management (#304)

  • Intelligent tool response processing to prevent context overflow
  • Server-aware summarization (different strategies for Playwright, Desktop Commander, GitHub)
  • Configurable thresholds (20KB/50KB defaults)
  • Real-time progress feedback during large response processing
  • UI now shows "Summarizing context (1/4)" during long operations instead of appearing frozen

Performance

  • Replaced polling with push-based events for Active Agents Sidebar (#298)
  • Immediate UI updates, reduced log spam, lower CPU usage

🐛 Bug Fixes

OAuth & Deep Links

  • Fixed OAuth deep link callbacks on Windows and Linux by registering speakmcp:// protocol (#259, #225)

Agent State Management

  • Kill switch now properly resets all agent state variables (#241)
  • Old agent messages no longer appear when starting new sessions
  • Session context no longer leaks between sessions (#294)
  • Voice input no longer loads conversation history from previous sessions (#299)
  • Completed sessions no longer block voice dictation (#301, #303)
  • Waveform no longer shows unexpectedly after agent finishes (#292)

Display & Formatting

  • Tool call expansion shows complete details instead of basic summaries (#142)
  • TTS button state synchronization fixed (#140)
  • Post-processing transcript auto-appends when {transcript} placeholder missing (#93)
  • Waveform positioning fixed to span full width while centered (#109)
  • Hide/Show buttons now work correctly in tool execution view (#289)
  • Session tabs now display correct titles (#288)
  • Progress UI shows immediately after voice input submission (#288)

LLM Handling

  • Graceful fallback on empty LLM responses (#156)
  • Parse non-standard reasoning field from providers like OpenRouter
  • Fixed OpenAI-compatible provider model discovery
  • Verifier no longer causes infinite loops on impossible tasks (#304)
  • Panel UI crashes fixed with null safety guards (#304)

🔧 New Configuration Options

// Tool response processing (src/main/config.ts)
mcpToolResponseProcessingEnabled: true
mcpToolResponseLargeThreshold: 20000      // 20KB
mcpToolResponseCriticalThreshold: 50000   // 50KB
mcpToolResponseChunkSize: 15000
mcpToolResponseProgressUpdates: true

🧹 Code Quality

Refactoring

  • Conversations section renamed to History for better semantic clarity (#149)
  • Removed unnecessary "Resume auto-scroll" indicator (#158)
  • Removed 'done', 'esc', and 'details' buttons from agent progress (#146)
  • Added comprehensive tool calls test suite (#256)
  • Improved TypeScript type safety with type-only imports
  • Added comprehensive debug logging throughout

🔒 Security & Compatibility

  • All changes maintain backward compatibility
  • No breaking changes to existing functionality
  • Enhanced error handling throughout

📊 Stats

  • 60+ commits since v0.2.3
  • 25+ PRs merged

🙏 Acknowledgments

Thanks to all users who provided feedback!

Key Pull Requests:

  • #264 - Multi-session agent support
  • #304 - Context overflow prevention and UI stability
  • #298 - Push-based events for sidebar
  • #294 - Session context isolation
  • #292 - Recording cleanup fixes
  • #289 - Tool execution UI fixes
  • #288 - Session tabs and progress UI
  • #287 - Playwright MCP example
  • #274 - Expansion state persistence
  • #260 - Waveform visibility fix
  • #259 - OAuth deep links for Windows/Linux
  • #258 - Text input focus fix
  • #257 - MCP dropdown visibility
  • #256 - Tool calls test suite

Full Changelog: v0.2.3...v0.3.0


Released: November 2025 | License: AGPL-3.0

v0.2.3

02 Nov 16:54
f1efb66

Choose a tag to compare

SpeakMCP v0.2.3

🎉 What's New

🤖 LLM & Model Improvements

Auto-Detection of Model Capabilities (#229)

  • Self-Learning System: Automatically detects which models support structured output (JSON Schema/Object)
  • Runtime Cache: Learns from actual usage and caches capabilities for 24 hours
  • No Configuration Needed: Works with any new model automatically without hardcoding
  • Fixes Infinite Retry Loops: Resolves issues with models like google/gemini-2.5-flash getting stuck
  • Enhanced Debug Logging: 11 new debug points throughout LLM request/response flow (enable with DEBUG_LLM=1)

Enhanced Error Handling (#221)

  • Novita AI Support: Fixed generic "model inference" errors from Novita and similar providers
  • Improved Fallback Chain: Better detection of structured output errors triggers proper fallback
  • Conversation Loading Fix: Resolved TypeError when loading conversations with tool results

Cloudflare 524 Timeout Handling

  • Improved Retry Logic: 524 timeout errors now properly treated as retryable
  • Better Error Detection: Enhanced detection for gateway and Cloudflare-specific errors
  • User-Friendly Messages: Clear console output for retry progress

🛠️ MCP (Model Context Protocol) Enhancements

MCP Initialization Progress Feedback (#224)

  • Real-Time Updates: Shows which server is being initialized with progress count
  • Visual Feedback: Clear "Initializing MCP tools" message with server names
  • Polling Updates: UI updates every 500ms during initialization
  • Seamless Transition: Automatically proceeds to agent mode when ready
  • Fixes #218

MCP Server Configuration UX (#ddcc8aa)

  • Simplified Command Input: Single field for full command (e.g., npx -y @server/name)
  • Auto-Connect: Newly added servers automatically connect with status notifications
  • Fixed Form Persistence: Resolved bug where old values persisted when switching modes
  • Better Shell Parsing: Handles quoted paths and spaces in commands correctly

MCP Server Logging

  • Capture Diagnostic Logs: View server output directly in UI
  • Collapsible Log Viewer: Terminal-style display on MCP config cards
  • Circular Buffer: In-memory log storage with clear functionality
  • Clean UI: No [stderr] labels, just timestamp + message

🎯 Agent Mode Improvements

Stuck Loading State Fixes (#222, #223)

  • Visible Killswitch Button: Always available when processing with confirmation dialog
  • Enhanced Keyboard Shortcuts: Work regardless of internal state flags
  • Mutation State Reset: Properly cleans up React Query mutations on emergency stop
  • Comprehensive Cleanup: Resets all processing states, ends conversations, stops TTS
  • Fixes #216

Tool Call Display Enhancements

  • Immediate Tool Call Display (#202): Tool requests appear before responses, not after
  • Stable Expansion State (#209): Tool calls remain expanded when new messages arrive
  • Content-Based IDs: Stable hashing prevents expansion state loss
  • Full Tool Details (#142): Complete parameters and results with JSON formatting
  • Pending State Indicators (#71fa447): Clear "Pending..." badge while waiting for responses
  • Fixes #196, #201

Conversation History Improvements (#210)

  • Complete History Saved: All messages, tool calls, and results preserved
  • Accurate Timestamps: Original message timestamps maintained
  • Proper Ordering: Message sequence correctly preserved
  • Fixes #195

Empty Response Handling (#52a92c1)

  • Graceful Fallback: Handles null/empty LLM responses without crashing
  • Case-Insensitive Detection: Better error pattern matching
  • Continued Execution: Logs errors and continues instead of crashing
  • Fixes #172

🎨 UI/UX Enhancements

Profile Management System (#3760558)

  • Save/Load Profiles: Create named profiles with custom guideline configurations
  • 3 Default Profiles: Default, Git & Version Control, AI Coding Agent
  • Import/Export: Share profiles between installations
  • Persistent Storage: Profiles saved and restored across sessions
  • Fixes #199

Waveform & Panel Improvements (#205)

  • Size Matching: Panel automatically sizes to accommodate full waveform (70 bars)
  • Persistent Resize: User resize preferences saved per mode (normal/agent/textInput)
  • Minimum Width: Enforced ~172px minimum to prevent waveform cutoff
  • Aerospace/niri Support: Proper floating behavior in tiling window managers
  • Fixes #203, #186

Settings & Navigation

  • Settings Menu Restored (#ffbde3c): Added back to tray menu without non-functional keybind
  • History Section (#149): Renamed "Conversations" to "History" for better clarity
  • Model Selector Focus Fix (#192): Prevents focus loss while typing
  • Fixes #194, #197

🔊 TTS (Text-to-Speech) Improvements

Enhanced Kill Switch (#193, #188)

  • Global TTS Manager: Centralized control for all audio elements
  • Emergency Stop Integration: Stops TTS on kill switch and ESC key
  • Auto-Play Prevention: Blocks TTS after agent termination
  • Stop TTS Button: Visible control in settings sidebar
  • Playing State Tracking: Button only shows when TTS is actively playing

CORS Support for Remote Server (#881512f)

  • Configurable Origins: Default to * for development
  • Preflight Handling: Skip auth for OPTIONS requests
  • UI Controls: Manage CORS settings in remote server configuration

🐛 Bug Fixes

  • Tool Execution IDs (#eb00cae): Handle undefined arguments without crashing
  • MCP Server Cleanup (#0606d70): Emergency stop no longer kills persistent MCP servers
  • Quoted Paths (#daaea0b): Proper shell-like parsing for commands with spaces
  • Linux Desktop Integration (#bc11c72): Fixed app menu appearance, icons, and PATH symlink
  • Linux Startup Notification (#9b7c554): Disabled distracting "SpeakMCP is ready" popup
  • macOS Floating Panel (#8d6f72f): Proper z-order and focusability for Aerospace compatibility

🔧 Technical Improvements

Developer Experience

  • Node Version Pinning (#306dc17): Added .nvmrc specifying Node v20.19.5
  • Worktree Setup Script (#c69e2b7): Fast worktree setup (~30sec vs ~3min)
  • UI Debug Mode (#192): Track focus, renders, and state changes with DEBUG_UI=1
  • Enhanced Logging: Comprehensive debug output throughout codebase

Code Quality

  • Optional Chaining (#ec71fc5): Replaced non-null assertions for better resilience
  • CodeRabbit Suggestions: Addressed review feedback across multiple PRs
  • TypeScript Fixes: Resolved type errors and improved type safety
  • Defensive Programming: Added guards and fallbacks throughout

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel)

macOS Builds

  • DMG: SpeakMCP-0.2.3-arm64.dmg (102M) | SpeakMCP-0.2.3-x64.dmg (109M)
  • PKG: SpeakMCP-0.2.3-arm64.pkg (101M) | SpeakMCP-0.2.3-x64.pkg (109M)
  • ZIP: SpeakMCP-0.2.3-arm64.zip (100M) | SpeakMCP-0.2.3-x64.zip (108M)

📥 Download Latest Release

🔄 Migration Notes

  • No breaking changes - All existing functionality preserved
  • Automatic migration - Settings and data migrate seamlessly
  • New features opt-in - All new features work with existing configurations
  • Backward compatible - Existing API endpoints and data structures unchanged

📝 Technical Details

  • 50+ commits since v0.2.2
  • Version: 0.2.2 → 0.2.3
  • Rust crate: Updated to 0.2.3
  • Node.js: Pinned to v20.19.5

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Key Pull Requests:

  • #229 - Auto-detection for model capabilities
  • #224 - MCP initialization progress feedback
  • #222 - Fix stuck loading state in agent mode
  • #221 - Improve error handling for providers
  • #210 - Save complete conversation history
  • #205 - Waveform size matching and persistence
  • #193 - Enhanced TTS kill switch
  • #192 - Model selector focus fix

Issues Closed:

  • #218 - MCP initialization feedback
  • #216 - Stuck loading state
  • #203 - Waveform size issues
  • #199 - Profile management
  • #197 - Settings menu restoration
  • #196 - Tool call expansion state
  • #195 - Conversation history
  • #194 - Settings menu removal
  • #188 - TTS kill switch
  • #186 - Waveform rendering
  • #172 - Empty response handling

Full Changelog: v0.2.2...v0.2.3


Released: November 2025 | License: AGPL-3.0

v0.2.2 - Stable Windows & Linux Builds

17 Oct 03:04

Choose a tag to compare

SpeakMCP v0.2.2

🎉 What's New

🌐 Remote Server API (Phase 1)

Transform SpeakMCP into an API-accessible AI agent service!

  • OpenAI-Compatible HTTP Server with /v1/chat/completions and /v1/models endpoints
  • Secure Bearer Token Authentication with auto-generated API keys
  • Full Agent Mode Support - Run MCP tools via HTTP API
  • Flexible Configuration - Configurable port, bind address, and logging
  • New Settings UI - Manage server, API keys, and view usage instructions

Example Usage:

curl -X POST http://127.0.0.1:3210/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"List my files"}]}'

🪟 Windows Build Improvements

  • ✅ Fixed Fastify module loading errors
  • ✅ Custom application icon for Windows
  • ✅ Improved native dependency handling
  • ✅ Production-ready Windows builds

🎯 Agent Control Enhancements

  • Visual Kill Switch - Stop individual agents with red X button in progress window
  • Confirmation Dialog - Prevents accidental termination
  • Comprehensive Cleanup - Aborts LLM requests, stops MCP servers, kills processes
  • Visual Feedback - Clear "Stopped" status with terminated badge

🧪 Testing Infrastructure

  • VNC GUI Testing for GitHub Actions
  • Automated GUI testing in CI/CD
  • Comprehensive testing documentation

🔧 Improvements

  • MCP Tool Counter - Display total count of enabled tools (#182)
  • Remote Server Settings - Improved UI with better layout
  • macOS Build Fixes - Resolved codesign and architecture issues (#181)
  • Better Error Handling - Improved stability across platforms

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows (x64), Linux (x64)

📥 Download Latest Release

🔄 Migration Notes

  • No breaking changes - All existing functionality preserved
  • Remote server disabled by default - Enable in Settings → Remote Server
  • Automatic migration - Settings and data migrate seamlessly

📝 Technical Details

  • 39 files changed: +6,157 additions, -1,021 deletions
  • New dependencies: Fastify for HTTP server
  • Version: 0.2.1 → 0.2.2

🐛 Bug Fixes

  • Fixed Windows Fastify module loading
  • Resolved macOS codesign timestamp issues
  • Fixed architecture compatibility on macOS
  • Improved native extension handling

📚 Documentation

  • Remote Server Guide: docs/remote-server-phase-1.md
  • VNC Testing: .github/VNC_QUICK_START.md
  • Updated README with new features

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Pull Requests:

  • #176 - Visual kill switch for agent progress windows
  • #166 - Remote Server Phase 1: OpenAI-compatible HTTP API

Issues Closed:

  • #182 - Add total tools enabled counter
  • #181 - Build launch architecture error
  • #175 - Visual kill switch request

Full Changelog: v0.2.1...v0.2.2


Released: October 2025 | License: AGPL-3.0

v0.2.1

12 Sep 23:49

Choose a tag to compare

Full Changelog: v0.2.0...v0.2.1