20 Feb 21:09

aj47

41dc4a6

SpeakMCP v1.4.0 Latest

Latest

SpeakMCP v1.4.0

New Features

Mobile Settings Parity: Added settings for skills, memories, personas, and loops on mobile
Supertonic TTS: Added Supertonic as a new local text-to-speech option
Past Sessions Modal: Moved past sessions from sidebar to a modal with delete support
Agent Spoken Output: Added explicit respond_to_user spoken-output flow for multi-channel support
Skills in Prompts: Enabled skills appear in predefined prompts dropdown menu
Ephemeral Messages: Hide internal completion nudge via ephemeral message system
Continue Conversation Shortcuts: Added Shift+hotkey keybinds to continue last conversation
Unlimited Agent Runs: Added option to disable max iteration limit for unlimited agent loops
Transcription Preview: Added opt-in live transcription preview during recording
Mobile Session Sync: Sync sessions between mobile and desktop with lazy loading

Bug Fixes

Fixed React deduplication and streaming repetition bugs on mobile
Fixed duplicate assistant messages and preserved mobile progress on empty history
Fixed TTS playback issues: stale generation results, double playback on remount, cleanup timing
Fixed streaming state getting stuck in "generating" when throttle drops events
Fixed sidebar collapse button spacing on macOS
Fixed search icon alignment and standardized icon-text spacing across UI
Fixed dialog grid children width overflow
Fixed input drafts sync from overwriting user typing
Fixed mobile transcription: word-boundary matching, StrictMode compat, focus timeouts
Fixed hold-mode race conditions in continue-conversation shortcuts
Fixed WebM to float32 PCM decoding for Parakeet STT
Fixed network retry delays made interruptible by kill switch
Bundled sherpa-onnx native packages properly in packaged app
Gated SpeakMCP-specific fetches behind isSpeakMCPServer check

Improvements

Consolidated duplicated summarization settings UI
Session tiles now fill available vertical space
Final assistant messages expand by default in tile/sessions view
Grid view click now resets tile layout (removed separate reset button)
Per-message TTS on mobile

Downloads

macOS (Apple Silicon): SpeakMCP-1.4.0-arm64.dmg - Signed and notarized
Android: SpeakMCP-1.4.0.apk

Assets 4

17 Feb 04:01

aj47

v1.3.0

e216769

v1.3.0

SpeakMCP v1.3.0

🚀 New Features

Agent Loops System — Schedule agents to run with specific prompts at regular intervals (#1036)
Terminal QR Code — QR code rendering for mobile pairing in headless/SSH/terminal environments (#1025)
Collapsible Queued Messages — Message queue panel is now collapsible and height-limited (#1042)
Rapid Fire Voice (Mobile) — Hold-to-speak rapid fire voice input with large mic button (#1024)
Conversation Keybinds — Shift+hotkey shortcuts for conversation navigation (#1021)
Supertonic TTS — New TTS provider support (#1000)
Disable Max Iteration Limit — Option to remove agent iteration cap (#1017)
Google Assistant Integration (Mobile) — Trigger SpeakMCP via "Hey Google" App Actions (#1022)
File-Based Dynamic Context Discovery — Efficient token usage via file-synced MCP tools, profiles, and skills (#897)
Debug Logging — Comprehensive debug logging feature with file management and UI (#178)
Persistent Cloudflare Tunnel URL — Store and display last known tunnel URL for reconnection (#723)
Bundled Electron MCP Skill — Bootstrap skill for local Electron MCP server (#1056)
Standalone Server Package — Central server architecture for multi-client support (#790, #791)

🐛 Bug Fixes

Desktop Kill Switch — Fixed kill switch not stopping agent sessions; now session-aware with interruptible retry delays (#1058, #255, #1023)
Mobile Session Sync — Fixed mobile sessions failing to load messages on desktop (#1059)
Mobile Session Polling — Added foreground session polling so new desktop sessions are detected (#1055)
Follow-up Message Display — Messages now appear immediately after session stop (#1057)
Claude Models via OpenRouter — Prevent assistant message prefill error (#1037)
Mobile Streaming — Fixed double words in streaming responses (#1028)
Model Search Sticky — Search input stays sticky in settings dropdown (#1041)
Waveform Height — Increased waveform visualization height, prevent excessive shrink (#1052)
Settings Sidebar Scroll — Settings panel now scrolls with sidebar (#1051)
Mobile Rapid Fire UX — Improved session visibility and voice feedback (#1038)
Verification Corruption — Prevent verification from corrupting final agent response (#1050)
Renderer Crash Recovery — Auto-recover from GPU/renderer process crashes (#810)
Mobile Network Failures — Graceful error handling with retry for app backgrounding (#489)
Hardened Runtime — Enabled by default for macOS permission persistence (#847)

⚡ Performance

Async Index + Throttled Progress — Async+debounced conversation index writes & throttled progress emits for dramatically reduced main process blocking (#1060)

🎨 UI Improvements

Sessions Sort by Last Modified (#1016)
Session Tiles Fill Vertical Space (#1018)
Search Bar Model Dropdown Fix (#1015)
Groq API Pricing Research (#1009)
Improved Agent Documentation (#1010)

📦 Downloads

Platform	File
macOS (Apple Silicon)	`SpeakMCP-1.3.0-arm64.dmg`
macOS (Intel)	`SpeakMCP-1.3.0-x64.dmg`
Android	`SpeakMCP-1.3.0.apk`

Note: macOS DMGs are code-signed and notarized by Apple for safe installation.

Assets 6

08 Jan 01:19

aj47

v1.2.0

13c3ed1

v1.2.0

🎯 Major Features

🎭 Agent Personas & Multi-Agent Delegation (#920)

Agent Personas - Create specialized AI personas with custom system prompts, tools, and skills
Delegation System - Route tasks to specialized sub-agents based on expertise
Agent Profile Management - Full CRUD for managing delegation targets
Internal & External Agents - Support for both built-in personas and external agents
Settings UI - Configure at Settings → Agent Personas

🤖 External ACP Agents (#894, #920)

ACP Protocol Support - Connect to external AI agents via Agent Client Protocol
Claude Code Integration - Delegate coding tasks to Claude Code
Auggie Support - Connect to Augment's Auggie agent
Multiple Transport Types:
- stdio - Spawn local agent processes
- remote - Connect to HTTP endpoints
- internal - Built-in delegation within SpeakMCP
Delegation Tools:
- list_available_agents - Discover available specialized agents
- delegate_to_agent - Route tasks to specific agents
- get_delegation_status - Check on delegated task progress

🧠 Dual-Model Agent Mode (#919)

Strong Model for Planning - Use powerful models for complex reasoning
Weak Model for Summarization - Use faster/cheaper models for UI summaries
Agent Summary View - See compact summaries of agent progress
Memory System - Save important findings to persistent memory files
Settings UI - Configure at Settings → Providers & Models

💾 Agent Memory System (#919, #963, #975)

Persistent Memories - Save key information across sessions
Ultra-Compact Format - Single-line memories for efficiency
Agent Memory Tools:
- list_memories - View saved memories
- save_memory - Create new memories
- delete_memory - Remove memories
- delete_multiple_memories - Bulk delete
- delete_all_memories - Clear all memories
Bulk Delete UI - Select and delete multiple memories in Settings

🧠 Agent Skills System (#895, #958)

Skills Service - Modular skills system for enhancing AI capabilities
Per-Profile Skills - Each profile can have its own set of skills
Import from GitHub - Import skills directly from GitHub repositories
Import from Local Folders - Add skills from local directories
Progressive Loading - Skills load on-demand to reduce token usage
Auto-Refresh - New skills detected automatically without restart
Proactive Context - Skills injected into system prompt automatically (#942)
Bundled Skills - Includes "Agent Skill Creation" meta-skill

📊 Langfuse Observability Integration (#929, #941, #947)

LLM Call Tracing - All LLM calls traced with model, prompts, responses, and token usage
Agent Session Traces - Complete agent workflows tracked from start to finish
MCP Tool Call Spans - Each tool invocation logged with inputs/outputs
Sessions Support - Group traces by conversation for multi-turn debugging
Profile Tags - Filter traces by profile name in Langfuse dashboard
Optional Dependency - Install langfuse only when needed
Settings UI - Configure via Settings > General > Langfuse Observability

🔗 Persistent Cloudflare Tunnel URLs (#922, #954)

Named Tunnels - Persistent URLs that remain the same across restarts
Quick Tunnels - Existing random URL functionality preserved
Auto-Start on Launch - Tunnel can start automatically when app opens
Tunnel Mode Selector - Choose between Quick and Named tunnels in UI
Available Tunnels List - Shows existing tunnels when logged in

📱 Mobile & Cross-Device Features (#962, #972)

Chat Sync - Sync chat state between desktop and mobile app
Conversation Continuity - Continue conversations across devices
Compact Chat UI - Single-line collapsed view for mobile
Pull-to-Refresh - Sync with desktop in real-time

🔧 Inter-Agent Communication (#959)

send_agent_message - Send messages between running agent sessions
Agent Coordination - Enables collaborative multi-agent workflows
list_running_agents - Discover active agent sessions

📱 WhatsApp Harness Improvements (#910, #905)

Message Handling in MCP Server - Moved message logic from desktop harness to MCP server
Conversation ID Persistence - Tracks conversations across sessions
/new Command - Start fresh conversations with /new command
Automatic Typing Indicator - Shows typing when agent starts processing
Harness Output Mode - Configurable auto-response via harness layer
WhatsApp Toggle - Enable/disable in main settings UI (#934)

💾 Context Compaction & Memory Management (#908, #909)

Persistent Compaction - Older messages summarized and saved to disk
No More Re-summarization - Summaries persist across sessions
Conversation Compaction on Load - Summarizes older messages when exceeding 20 messages
MCP Process Cleanup - Properly terminates MCP server processes on app quit

🎮 Profile CRUD Tools (#938)

create_profile - Create new profiles programmatically
update_profile - Modify existing profiles
delete_profile - Remove profiles with safeguards
duplicate_profile - Copy profiles including all configurations

📐 Model Registry with Fuzzy Matching (#907)

~100 Models Supported - Comprehensive registry for context window detection
Fuzzy Matching - Correctly identifies models through proxies (e.g., Claude via OpenAI-compatible endpoints)
Providers Covered - Anthropic, OpenAI, Google, xAI, DeepSeek, Mistral, Qwen, Llama

🛠️ MCP Tool Discovery (#948, #950)

Lazy Loading - Lightweight system prompt with on-demand tool details
list_server_tools - Get all tools from a specific MCP server
get_tool_schema - Get full JSON schema for any tool
Dynamic Tool Filtering - Reduce tool overhead from 64 to ~20 per call

🚀 UX & UI Improvements

Session Tiles

Expand to Full Window (#915) - New button to expand session tile to fill entire window
Clickable Title Area (#886) - Click anywhere on title to collapse/expand
Removed Max Height Constraint (#927) - Tiles can now fill available vertical space
Proper Sizing Transitions (#914) - Panel resizes correctly when switching from voice to agent mode

Floating Panel

Auto-Hide When Main Focused (#887) - Panel hides when main window has focus
Cleaner UI (#904) - Removed ESC hint and tool call progress indicator
Responsive Hotkey Hints (#898, #980) - Hints hide on narrow screens and during sessions
Drag Fix (#978) - Panel no longer closes during drag/button interactions

Sidebar & Navigation

Improved Sidebar UI (#930) - Added 'Past' label, simplified search placeholder
Past Sessions Cleanup (#891) - Removed redundant title text
Agent Mode Keybind Visibility (#892) - Aura keybind now visible on sessions page

Mobile App

Compact Tool Calls (#932, #972) - Tool calls collapse to single line
Session Title Fix (#931) - Correct title shows instead of 'Transcribing...'
Compact Chat UI (#972) - Single-line collapsed view matching desktop styling

Streamer Mode (#893)

Hide Sensitive Info - Masks phone numbers, API keys, QR codes, and URLs
Global Indicator - Shows in sidebar footer when active
Privacy During Streaming - Protect sensitive data when screen sharing

🔧 API & Settings Improvements

Feature Toggle API (#939)

verificationEnabled - Query/toggle verification setting
messageQueueEnabled - Query message queue setting
parallelToolExecutionEnabled - Query parallel execution setting
toggle_verification - New tool to enable/disable verification

New Builtin Tools

read_media_file - Read images/audio as base64 for multimodal LLM input
load_skill_instructions - Fetch full skill instructions on-demand

Settings Reorganization

Langfuse Moved (#949) - Settings moved from sidebar to General Settings page
Removed Auto-Configured Filesystem Server (#961) - Use execute_command for filesystem operations

🐛 Bug Fixes

Agent Loop & Completion (#946, #967, #970)

Fixed empty response retry loops with counter and limit
Fixed verification loops incrementing fail count in all paths
Fixed tool context loss during context shrinking
Simplified LLM integration trusting model's completion signals
Added tool timeout handling (20s default, 120s for browser automation)
Fixed parallel call race conditions

Verification Logic (#951)

Fixed direct verifier call bypassing safety features
Fixed duplicate messages in verification context
Fixed loop detection false positives on short responses
Fixed empty tool results handling
Unified all verification paths

Voice & Recording

Mic Button Fix (#906) - Voice recording now takes precedence over text input
Waveform Fix (#976) - Waveform now shows for normal voice dictation mode

Model & Preset

Summarization Model Sync (#977) - Summarization model updates when switching presets

Memory & Processes

MCP Process Cleanup (#909) - No more orphaned node processes (2GB+ memory leak fixed)

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS

macOS Builds (Signed)

DMG: SpeakMCP-1.2.0-arm64.dmg | SpeakMCP-1.2.0-x64.dmg

Android

APK: SpeakMCP-1.2.0.apk

Linux

AppImage: SpeakMCP-1.2.0.AppImage

📥 Download Latest Release

🔄 Migration Notes

No breaking changes - All existing functionality preserved
Automatic migration - Settings and data migrate seamlessly
New features opt-in - All new features work with existing configurations
Backward compatible - Existing API endpoints and data structures un...

Assets 6

24 Dec 06:49

aj47

v1.1.0

3c2bf37

v1.1.0

🎯 Major Features

🤖 Vercel AI SDK Migration (#812)

Simplified LLM Code - Removed ~1,100 lines of custom HTTP/fetch logic
Provider Flexibility - Easy to add new providers (Anthropic, etc.)
Better Streaming - AI SDK handles streaming protocols natively
Type Safety - Better TypeScript support from AI SDK
Maintained Compatibility - Same public API, MCP tools work unchanged

📋 Kanban View for Sessions (#807)

Three-Column Layout - Idle, In Progress, and Done columns
Visual State Indicators - Clear status for each session
View Toggle - Switch between Grid and Kanban views
Session Organization - Better workflow management

📝 Predefined Prompts (#809)

Save Frequent Prompts - Quick access to commonly used prompts
One-Click Insert - Click to insert prompt into input field
Full Management - Add, edit, and delete saved prompts
Persistent Storage - Prompts saved across sessions

📱 Mobile Settings Management (#744)

Profile Switching - Switch between profiles directly from mobile app
MCP Server Management - View connection status and enable/disable servers remotely
Feature Toggles - Control post-processing, TTS, and tool approval from mobile
Pull-to-Refresh - Sync settings with desktop in real-time
New API endpoints: /v1/profiles, /v1/mcp/servers, /v1/settings

🔍 MCP Registry Integration (#785)

Official Registry Browser - Discover 100+ MCP servers from the official registry
One-Click Installation - Add servers with a single click
Smart Search - Find servers by name and description
Server Type Badges - npm, PyPI, Docker, Remote server indicators
5-Minute Caching - Reduced API calls for better performance

📤 Enhanced Profile Export/Import (#772)

MCP Server Definitions - Export now includes all enabled MCP server configurations
Model Settings - Export includes model configuration settings
Smart Import - Merges MCP definitions without overwriting existing config
Easy Sharing - Share complete profiles with team members

🚀 Performance & UX Improvements

Recording Latency Reduction (#734)

250ms faster - Reduced hold-to-record delay from 800ms → 250ms
Overlapped initialization - Start recording before showing panel UI
Snappier response - Faster feedback when holding Ctrl

Mobile Text Interaction (#735)

Expandable/Collapsible Text - Tap anywhere on collapsed text to expand
Selectable Content - Copy LLM responses and tool parameters
Better Tool Cards - Larger tap targets for expanding tool results
Visual Feedback - Pressed states for better UX

Session Management (#739, #740)

Always-Visible Start Buttons - Start new sessions anytime, even with active sessions
Queueable Voice Input - Record voice messages during agent processing
Message Queuing - Transcripts queue automatically when agent is busy

UI Polish (#733, #738, #800, #801, #806, #811)

Stop Sign Icon - Changed kill switch from X to OctagonX for clarity
Collapsed Servers - MCP servers collapsed by default for cleaner UI
Kill Switch in Follow-ups - Stop button now in follow-up input panels
Responsive MCP Modal (#800) - Tool details modal now scrollable and responsive
Improved Sidebar Layout (#801) - Settings above sessions, sessions scroll to bottom
Sessions Icon (#806) - Quick navigation icon in collapsed sidebar
Edit Profiles Shortcut (#811) - Direct link to profile settings in dropdown

Mobile Improvements (#794, #816)

Android Branding (#794) - Updated app name, icons, and speakmcp:// deep linking
Conversation Recovery (#816) - Recover server state on connection retry (no duplicates)

🔧 Code Quality & Maintenance

Major Refactoring Sprint (#775, #776, #777, #778, #779, #780)

keyboard.ts Modularization (#775) - Split 1,170 lines into focused modules
MCP Config Manager (#776) - React Context to eliminate prop drilling
tipc.ts Split (#777) - 66% reduction (2,913 → 974 lines) into 14 domain modules
MCP Service Refactor (#778) - Extracted into focused, maintainable modules
ChatScreen Components (#779) - Mobile app code organization with custom hooks
LLM Provider Abstraction (#780) - Clean provider interface for all LLM backends

LLM Code Consolidation (#781)

Removed structured-output.ts - Consolidated unused code (~285 LOC reduction)
Moved makeStructuredContextExtraction - Relocated to llm-fetch.ts

Shared Package Improvements (#773)

Type Consolidation - AgentProgressStep, AgentProgressUpdate moved to shared
New Utilities - formatDuration, formatTimestamp, statusColors
Custom Hooks - useCollapsibleState, useCollapsibleSet for UI state
Dependency Cleanup - Proper runtime vs dev dependency classification

Testing Infrastructure (#770)

E2E Tests - Playwright tests for Electron with custom fixtures
Smoke Tests - App launch and basic navigation tests
Settings Tests - Configuration and provider testing
MCP Tests - Server management and session tests
CI Integration - GitHub Actions workflow for automated testing

Refactoring Issue Templates (#745)

Created 9 detailed issue templates for future refactoring work
Includes proposals for tipc.ts, mcp-service.ts, keyboard.ts modularization

🐛 Bug Fixes

LLM & Provider

Empty Response Handling (#793, #797) - Fixed false "Network error" for valid empty completions
Provider Name Display (#737) - Show actual preset name (OpenRouter, Together AI) instead of generic "OpenAI"
Groq TTS Update (#784) - Updated to Orpheus models (PlayAI deprecated)

Mobile & Desktop

Disabled Server Tools (#743) - Hide tools from disabled MCP servers
JSX Nesting (#728) - Fixed parse errors in MCP config manager
Tunnel Persistence (#722) - Auto-reconnect mobile app on restart with stable device ID

UI/UX

Tool Collapse (#713) - Collapsible server groups in Tools section with state persistence
Mic Button (#732) - Mic clickable during agent processing with message queuing

📊 Stats

40+ PRs merged since v1.1.0
~3,000+ lines refactored - Major code quality improvements
AI SDK migration - Simplified LLM integration
E2E testing infrastructure - New Playwright test suite
Improved mobile experience - Settings management, conversation recovery, Android branding

🔄 Migration Notes

No breaking changes - All existing functionality preserved
Automatic migration - Settings and data migrate seamlessly
New features opt-in - All new features work with existing configurations
Backward compatible - Existing API endpoints and data structures unchanged

📥 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows, Linux, Android, iOS

macOS Builds

DMG: SpeakMCP-1.2.0-arm64.dmg | SpeakMCP-1.2.0-x64.dmg
PKG: SpeakMCP-1.2.0-arm64.pkg | SpeakMCP-1.2.0-x64.pkg
ZIP: SpeakMCP-1.2.0-arm64.zip | SpeakMCP-1.2.0-x64.zip

📥 Download Latest Release

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Full Changelog: v1.1.0...v1.2.0

Jan 1st update:
New Features:
• Reset Layout Button - Restores
agent tiles to default dimensions
in grid view
• Profile Import/Export (Mobile) -
Share API export + JSON paste
import
• Profile Import/Export (Desktop) -
Added missing Import/Export
buttons
• GitHub Actions - New workflow for
Linux/Windows builds

Bug Fixes:
• Agent premature completion fix
(LLM verifier always runs when
enabled)
• Text input hotkey fix (Ctrl+T no
longer triggers voice timer)
• Blank hover panel fix for
completed snoozed sessions
• Text input panel resize after
waveform recording
• Infinite refetch loop prevention
on mobile
• Verifier prompt bias fix
• Safe error message checking
• Profile import error handling
separation

License: AGPL-3.0

Assets 13

08 Dec 03:35

aj47

v1.0.0

11710c3

SpeakMCP v1.0.0

SpeakMCP v1.0.0 - Initial Release 🎉

The first official release of SpeakMCP - an AI-powered dictation tool with MCP (Model Context Protocol) integration.

Features

🎤 Voice Dictation - Hold Ctrl to record and transcribe your voice
🤖 AI-Powered - Integrates with OpenAI, Anthropic, Google, and more
🔧 MCP Tools - Connect to Model Context Protocol servers for extended functionality
📱 Cross-Platform - Available for macOS (Intel & Apple Silicon) and Android
🎯 Agent Mode - Multi-step AI agent with tool calling capabilities
🔒 Privacy-First - All processing happens locally

Downloads

Windows

There is now a .exe setup and .exe portable build for windows!

macOS

Apple Silicon (M1/M2/M3): SpeakMCP-1.0.0-arm64.dmg
Intel: SpeakMCP-1.0.0-x64.dmg
ZIP and PKG installers also available

Android

APK: SpeakMCP-1.0.0-android.apk

Installation

macOS

Download the appropriate DMG for your Mac
Open the DMG and drag SpeakMCP to Applications
On first launch, you may need to right-click and select "Open" due to Gatekeeper
Grant accessibility and microphone permissions when prompted

Android

Download the APK file
Enable "Install from unknown sources" in your device settings
Install the APK
Grant microphone permissions when prompted

Requirements

macOS: 12.0 (Monterey) or later
Android: Android 7.0 (API level 24) or later
API key for at least one AI provider (OpenAI, Anthropic, etc.)

Getting Started

Launch SpeakMCP
Enter your AI provider API key in Settings
Hold Ctrl to start recording
Release Ctrl to transcribe and get AI response
Optional: Configure MCP servers for extended capabilities

Thank you for trying SpeakMCP! Please report any issues on GitHub.

Assets 10

07 Dec 20:27

aj47

150b161

v1.0.0 - Mobile App & Parallel Agent Sessions

🚀 SpeakMCP 1.0 is here!

This release marks a major milestone with two flagship features: the new Mobile App for voice-controlled AI on the go, and Parallel Agent Sessions for running multiple AI agents simultaneously.

⚠️ Platform Support

Windows and Linux do not currently support MCP tools in this release.

For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.

📱 Mobile App

Control your AI agents from anywhere! The SpeakMCP mobile app connects to your desktop via the Remote Server API.

Features

Voice input - Speak commands to your AI agent from your phone
Real-time progress - Watch your agent work in real-time
Conversation continuity (#401) - Continue conversations seamlessly between mobile and desktop
Emergency kill switch (#398) - Stop all agents instantly with /v1/emergency-stop endpoint
QR code setup - Scan a QR code to configure the mobile app instantly

Monorepo Architecture (#417)

Converted to pnpm workspace structure
Mobile app lives at apps/mobile/
Desktop app at apps/desktop/
Shared design tokens package (@speakmcp/shared) for consistent styling
Unified development workflow: pnpm dev for desktop, pnpm dev:mobile for mobile

Cloudflare Tunnel Integration (#363)

One-click internet exposure - No port forwarding or network config needed
Cloudflare Quick Tunnel (no account required)
QR code generation with speakmcp:// deep links
Instant mobile app configuration by scanning

🔀 Parallel Agent Sessions

Run multiple AI agents at the same time! The new tiling session dashboard lets you manage multiple concurrent agent sessions.

Tiling Session Dashboard (#359)

Sessions dashboard is now the landing page
Responsive tiling layout (1=full width, 2=50/50, 3+=responsive grid)
Each tile shows full conversation with internal scroll
New sessions animate into the grid
Tile size persistence (#410) - Tiles remember their size
Default tile width optimized (#407) - Two tiles fit side-by-side

Unified Sessions & History (#429)

History tab merged into Sessions page
Past sessions in collapsible section
Click any past session to open as new tile
Search and filter past sessions
Lazy loading with "Load More"

Multi-Session Controls

Panel hide button (#405) - Minimize entire floating panel when multiple sessions active
Sidebar session navigation (#438) - Click sidebar session to scroll to its tile
Continue from tiles (#408, #381) - Follow-up stays in same tile, no new window spawns
Active agents height limit (#406) - Prevents overlap with macOS window controls

Streaming Output (#388)

See LLM responses as they're generated in real-time
Live streaming display with animated cursor
Auto-scroll follows streaming content

🎯 Other Major Features

Built-in Settings Tools for Self-Configuration (#386)

The agent can now configure itself! 5 new built-in tools:

list_mcp_servers - View all configured MCP servers with status
toggle_mcp_server - Enable/disable MCP servers by name
list_profiles - View all profiles and which is active
switch_profile - Switch between profiles by ID or name
get_current_profile - Get current profile with full guidelines

Per-Profile MCP Server Configurations (#394)

Each profile stores its own MCP server settings
Automatically apply MCP config when switching profiles
Different profiles can have different enabled servers/tools

Editable Base System Prompt (#431)

Edit the base system prompt in Agent Settings
Per-profile system prompt storage
One-click restore to default prompt

Direct Response Support (#437)

Agent answers simple questions without forcing tool calls
Reduced latency for Q&A interactions
Smarter detection of when tools are actually needed

System Prompt Optimization (#432)

~50% token reduction in system prompts
Removed dead code and redundant instructions
Cleaner, more focused prompts

🔧 UI/UX Improvements

Text/Voice Follow-up Inputs (#383, #376)

New input component in floating agent progress overlay
Text input field + Submit button + Voice button
Continue conversations via text OR voice

Profile Management Redesign (#425, #426)

Profile dropdown in sidebar with full management
Create, edit, delete, import/export from dropdown
Create new profile directly from dropdown

Always-On Agent Mode (#428)

MCP tools and agent mode now always enabled
Removed unnecessary toggle switches
Safety settings moved to General Settings

Tool Calling UX (#337)

Space to approve, Escape to deny tool calls
Better parameter previews
Hotkey hints on buttons

Other UI Fixes

Rate limit retry banner with countdown (#369)
Copy button for agent responses (#333)
Settings reorganization (#348, #362)
Submit hint shows 'Enter' for mic button (#396)

🐛 Bug Fixes

Agent & Session

Panel now focusable when agent completes (#435)
Text input responsive after agent finishes (#422)
Killswitch properly closes panel during MCP init (#340)
Final output summary expanded by default (#341)
Floating GUI shows on voice keybind (#375)
Scrollbars hidden until hover (#382)
Maximize button no longer creates blank panel (#372)

Profile & Settings

Profile text no longer truncated after save (#427)
Panel doesn't appear when continuing from tiles (#413)

LLM & Provider

Model preferences persist across sessions (#364)
Empty content with toolCalls accepted (#346)
Verifier JSON schema fixes (#347)
Cerebras API compatibility (#352)

Logging

Stopped logging entire conversation history (#409, #412)

Development

Metro bundler pnpm support (#418)
Auto-build Rust binary before dev (#419)

🧹 Code Quality

Testing

56 new key-utils tests (#421)
Total tests: 32 → 88 (+175%)

Cleanup

Removed ~941 lines of unused code (#329)
Debug logging cleanup (#330)
DEBUGGING.md distilled from 425 to 48 lines (#374)

📊 Stats

50+ PRs merged since v0.3.0
100+ commits
88 tests (up from 32)
Version: 0.3.0 → 1.0.0

🙏 Acknowledgments

Thanks to all users who provided feedback!

Key Pull Requests:

#438 - Sidebar session navigation
#437 - Direct response support
#435 - Panel focusability fix
#432 - System prompt optimization
#431 - Editable system prompt
#429 - Unified sessions & history
#428 - Always-on agent mode
#427 - Profile sync fix
#425 - Profile dropdown management
#421 - Key-utils tests
#417 - pnpm monorepo with mobile
#410 - Tile size persistence
#408 - Continue in same tile
#405 - Panel hide button
#401 - Mobile conversation continuity
#398 - Emergency stop endpoint
#397 - Final summary fixes
#394 - Per-profile MCP configs
#388 - Streaming output
#386 - Built-in settings tools
#383 - Follow-up inputs
#363 - Cloudflare tunnel
#359 - Tiling session dashboard

Full Changelog: v0.3.0...v1.0.0

Released: December 2025 | License: AGPL-3.0

Assets 10

24 Nov 22:27

aj47

untagged-5c7601642468acca537b

d3cad06

v0.3.0 - Multi-Session Agent Support

⚠️ Platform Support

Windows and Linux do not currently support MCP tools in this release.

For Windows and Linux users who want dictation-only functionality, please use v0.2.2 which includes Windows (.exe) and Linux (.AppImage, .deb, .snap) builds.

🎯 Major Features

Multi-Session Agent Support (#264)

Run multiple agent sessions concurrently with independent progress tracking
Snooze/minimize sessions to background and restore from Active Agents sidebar
Per-session killswitch - stopping one session doesn't affect others
Eliminated "Initializing..." delay for new sessions
Fixed state machine violations preventing generic "Processing..." states

🔧 Improvements

Session Management (#239)

Voice input now defaults to creating new sessions for simplified workflow
Configurable via alwaysCreateNewSessionForVoice setting
Text input continues to support conversation continuation

UI/UX Enhancements

Voice input waveform now visible when agent is running (#260)
Text input automatically receives focus when spawned via keybind (Ctrl+T) (#258)
Improved visibility of MCP transport type dropdown (#257)
Assistant message and thinking block expansion state now persists when new messages arrive (#274)
Each <think> section now has unique ARIA IDs for accessibility

Tool Management

Tools from deleted MCP servers now properly removed from UI (#114)
Tool call requests appear immediately in agent progress before responses (#202)
Added Playwright MCP server as example for browser automation (#287)

Context Management (#304)

Intelligent tool response processing to prevent context overflow
Server-aware summarization (different strategies for Playwright, Desktop Commander, GitHub)
Configurable thresholds (20KB/50KB defaults)
Real-time progress feedback during large response processing
UI now shows "Summarizing context (1/4)" during long operations instead of appearing frozen

Performance

Replaced polling with push-based events for Active Agents Sidebar (#298)
Immediate UI updates, reduced log spam, lower CPU usage

🐛 Bug Fixes

OAuth & Deep Links

Fixed OAuth deep link callbacks on Windows and Linux by registering speakmcp:// protocol (#259, #225)

Agent State Management

Kill switch now properly resets all agent state variables (#241)
Old agent messages no longer appear when starting new sessions
Session context no longer leaks between sessions (#294)
Voice input no longer loads conversation history from previous sessions (#299)
Completed sessions no longer block voice dictation (#301, #303)
Waveform no longer shows unexpectedly after agent finishes (#292)

Display & Formatting

Tool call expansion shows complete details instead of basic summaries (#142)
TTS button state synchronization fixed (#140)
Post-processing transcript auto-appends when {transcript} placeholder missing (#93)
Waveform positioning fixed to span full width while centered (#109)
Hide/Show buttons now work correctly in tool execution view (#289)
Session tabs now display correct titles (#288)
Progress UI shows immediately after voice input submission (#288)

LLM Handling

Graceful fallback on empty LLM responses (#156)
Parse non-standard reasoning field from providers like OpenRouter
Fixed OpenAI-compatible provider model discovery
Verifier no longer causes infinite loops on impossible tasks (#304)
Panel UI crashes fixed with null safety guards (#304)

🔧 New Configuration Options

// Tool response processing (src/main/config.ts)
mcpToolResponseProcessingEnabled: true
mcpToolResponseLargeThreshold: 20000      // 20KB
mcpToolResponseCriticalThreshold: 50000   // 50KB
mcpToolResponseChunkSize: 15000
mcpToolResponseProgressUpdates: true

🧹 Code Quality

Refactoring

Conversations section renamed to History for better semantic clarity (#149)
Removed unnecessary "Resume auto-scroll" indicator (#158)
Removed 'done', 'esc', and 'details' buttons from agent progress (#146)
Added comprehensive tool calls test suite (#256)
Improved TypeScript type safety with type-only imports
Added comprehensive debug logging throughout

🔒 Security & Compatibility

All changes maintain backward compatibility
No breaking changes to existing functionality
Enhanced error handling throughout

📊 Stats

60+ commits since v0.2.3
25+ PRs merged

🙏 Acknowledgments

Thanks to all users who provided feedback!

Key Pull Requests:

#264 - Multi-session agent support
#304 - Context overflow prevention and UI stability
#298 - Push-based events for sidebar
#294 - Session context isolation
#292 - Recording cleanup fixes
#289 - Tool execution UI fixes
#288 - Session tabs and progress UI
#287 - Playwright MCP example
#274 - Expansion state persistence
#260 - Waveform visibility fix
#259 - OAuth deep links for Windows/Linux
#258 - Text input focus fix
#257 - MCP dropdown visibility
#256 - Tool calls test suite

Full Changelog: v0.2.3...v0.3.0

Released: November 2025 | License: AGPL-3.0

Assets 4

02 Nov 16:54

aj47

untagged-0baa6dfec77c0ce1c2a4

f1efb66

v0.2.3

SpeakMCP v0.2.3

🎉 What's New

🤖 LLM & Model Improvements

Auto-Detection of Model Capabilities (#229)

Self-Learning System: Automatically detects which models support structured output (JSON Schema/Object)
Runtime Cache: Learns from actual usage and caches capabilities for 24 hours
No Configuration Needed: Works with any new model automatically without hardcoding
Fixes Infinite Retry Loops: Resolves issues with models like google/gemini-2.5-flash getting stuck
Enhanced Debug Logging: 11 new debug points throughout LLM request/response flow (enable with DEBUG_LLM=1)

Enhanced Error Handling (#221)

Novita AI Support: Fixed generic "model inference" errors from Novita and similar providers
Improved Fallback Chain: Better detection of structured output errors triggers proper fallback
Conversation Loading Fix: Resolved TypeError when loading conversations with tool results

Cloudflare 524 Timeout Handling

Improved Retry Logic: 524 timeout errors now properly treated as retryable
Better Error Detection: Enhanced detection for gateway and Cloudflare-specific errors
User-Friendly Messages: Clear console output for retry progress

🛠️ MCP (Model Context Protocol) Enhancements

MCP Initialization Progress Feedback (#224)

Real-Time Updates: Shows which server is being initialized with progress count
Visual Feedback: Clear "Initializing MCP tools" message with server names
Polling Updates: UI updates every 500ms during initialization
Seamless Transition: Automatically proceeds to agent mode when ready
Fixes #218

MCP Server Configuration UX (#ddcc8aa)

Simplified Command Input: Single field for full command (e.g., npx -y @server/name)
Auto-Connect: Newly added servers automatically connect with status notifications
Fixed Form Persistence: Resolved bug where old values persisted when switching modes
Better Shell Parsing: Handles quoted paths and spaces in commands correctly

MCP Server Logging

Capture Diagnostic Logs: View server output directly in UI
Collapsible Log Viewer: Terminal-style display on MCP config cards
Circular Buffer: In-memory log storage with clear functionality
Clean UI: No [stderr] labels, just timestamp + message

🎯 Agent Mode Improvements

Stuck Loading State Fixes (#222, #223)

Visible Killswitch Button: Always available when processing with confirmation dialog
Enhanced Keyboard Shortcuts: Work regardless of internal state flags
Mutation State Reset: Properly cleans up React Query mutations on emergency stop
Comprehensive Cleanup: Resets all processing states, ends conversations, stops TTS
Fixes #216

Tool Call Display Enhancements

Immediate Tool Call Display (#202): Tool requests appear before responses, not after
Stable Expansion State (#209): Tool calls remain expanded when new messages arrive
Content-Based IDs: Stable hashing prevents expansion state loss
Full Tool Details (#142): Complete parameters and results with JSON formatting
Pending State Indicators (#71fa447): Clear "Pending..." badge while waiting for responses
Fixes #196, #201

Conversation History Improvements (#210)

Complete History Saved: All messages, tool calls, and results preserved
Accurate Timestamps: Original message timestamps maintained
Proper Ordering: Message sequence correctly preserved
Fixes #195

Empty Response Handling (#52a92c1)

Graceful Fallback: Handles null/empty LLM responses without crashing
Case-Insensitive Detection: Better error pattern matching
Continued Execution: Logs errors and continues instead of crashing
Fixes #172

🎨 UI/UX Enhancements

Profile Management System (#3760558)

Save/Load Profiles: Create named profiles with custom guideline configurations
3 Default Profiles: Default, Git & Version Control, AI Coding Agent
Import/Export: Share profiles between installations
Persistent Storage: Profiles saved and restored across sessions
Fixes #199

Waveform & Panel Improvements (#205)

Size Matching: Panel automatically sizes to accommodate full waveform (70 bars)
Persistent Resize: User resize preferences saved per mode (normal/agent/textInput)
Minimum Width: Enforced ~172px minimum to prevent waveform cutoff
Aerospace/niri Support: Proper floating behavior in tiling window managers
Fixes #203, #186

Settings & Navigation

Settings Menu Restored (#ffbde3c): Added back to tray menu without non-functional keybind
History Section (#149): Renamed "Conversations" to "History" for better clarity
Model Selector Focus Fix (#192): Prevents focus loss while typing
Fixes #194, #197

🔊 TTS (Text-to-Speech) Improvements

Enhanced Kill Switch (#193, #188)

Global TTS Manager: Centralized control for all audio elements
Emergency Stop Integration: Stops TTS on kill switch and ESC key
Auto-Play Prevention: Blocks TTS after agent termination
Stop TTS Button: Visible control in settings sidebar
Playing State Tracking: Button only shows when TTS is actively playing

CORS Support for Remote Server (#881512f)

Configurable Origins: Default to * for development
Preflight Handling: Skip auth for OPTIONS requests
UI Controls: Manage CORS settings in remote server configuration

🐛 Bug Fixes

Tool Execution IDs (#eb00cae): Handle undefined arguments without crashing
MCP Server Cleanup (#0606d70): Emergency stop no longer kills persistent MCP servers
Quoted Paths (#daaea0b): Proper shell-like parsing for commands with spaces
Linux Desktop Integration (#bc11c72): Fixed app menu appearance, icons, and PATH symlink
Linux Startup Notification (#9b7c554): Disabled distracting "SpeakMCP is ready" popup
macOS Floating Panel (#8d6f72f): Proper z-order and focusability for Aerospace compatibility

🔧 Technical Improvements

Developer Experience

Node Version Pinning (#306dc17): Added .nvmrc specifying Node v20.19.5
Worktree Setup Script (#c69e2b7): Fast worktree setup (~30sec vs ~3min)
UI Debug Mode (#192): Track focus, renders, and state changes with DEBUG_UI=1
Enhanced Logging: Comprehensive debug output throughout codebase

Code Quality

Optional Chaining (#ec71fc5): Replaced non-null assertions for better resilience
CodeRabbit Suggestions: Addressed review feedback across multiple PRs
TypeScript Fixes: Resolved type errors and improved type safety
Defensive Programming: Added guards and fallbacks throughout

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel)

macOS Builds

DMG: SpeakMCP-0.2.3-arm64.dmg (102M) | SpeakMCP-0.2.3-x64.dmg (109M)
PKG: SpeakMCP-0.2.3-arm64.pkg (101M) | SpeakMCP-0.2.3-x64.pkg (109M)
ZIP: SpeakMCP-0.2.3-arm64.zip (100M) | SpeakMCP-0.2.3-x64.zip (108M)

📥 Download Latest Release

🔄 Migration Notes

No breaking changes - All existing functionality preserved
Automatic migration - Settings and data migrate seamlessly
New features opt-in - All new features work with existing configurations
Backward compatible - Existing API endpoints and data structures unchanged

📝 Technical Details

50+ commits since v0.2.2
Version: 0.2.2 → 0.2.3
Rust crate: Updated to 0.2.3
Node.js: Pinned to v20.19.5

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Key Pull Requests:

#229 - Auto-detection for model capabilities
#224 - MCP initialization progress feedback
#222 - Fix stuck loading state in agent mode
#221 - Improve error handling for providers
#210 - Save complete conversation history
#205 - Waveform size matching and persistence
#193 - Enhanced TTS kill switch
#192 - Model selector focus fix

Issues Closed:

#218 - MCP initialization feedback
#216 - Stuck loading state
#203 - Waveform size issues
#199 - Profile management
#197 - Settings menu restoration
#196 - Tool call expansion state
#195 - Conversation history
#194 - Settings menu removal
#188 - TTS kill switch
#186 - Waveform rendering
#172 - Empty response handling

Full Changelog: v0.2.2...v0.2.3

Released: November 2025 | License: AGPL-3.0

Assets 8

17 Oct 03:04

aj47

v0.2.2

0656376

v0.2.2 - Stable Windows & Linux Builds

SpeakMCP v0.2.2

🎉 What's New

🌐 Remote Server API (Phase 1)

Transform SpeakMCP into an API-accessible AI agent service!

OpenAI-Compatible HTTP Server with /v1/chat/completions and /v1/models endpoints
Secure Bearer Token Authentication with auto-generated API keys
Full Agent Mode Support - Run MCP tools via HTTP API
Flexible Configuration - Configurable port, bind address, and logging
New Settings UI - Manage server, API keys, and view usage instructions

Example Usage:

curl -X POST http://127.0.0.1:3210/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"List my files"}]}'

🪟 Windows Build Improvements

✅ Fixed Fastify module loading errors
✅ Custom application icon for Windows
✅ Improved native dependency handling
✅ Production-ready Windows builds

🎯 Agent Control Enhancements

Visual Kill Switch - Stop individual agents with red X button in progress window
Confirmation Dialog - Prevents accidental termination
Comprehensive Cleanup - Aborts LLM requests, stops MCP servers, kills processes
Visual Feedback - Clear "Stopped" status with terminated badge

🧪 Testing Infrastructure

VNC GUI Testing for GitHub Actions
Automated GUI testing in CI/CD
Comprehensive testing documentation

🔧 Improvements

MCP Tool Counter - Display total count of enabled tools (#182)
Remote Server Settings - Improved UI with better layout
macOS Build Fixes - Resolved codesign and architecture issues (#181)
Better Error Handling - Improved stability across platforms

📦 Downloads

Cross-Platform Support: macOS (Apple Silicon & Intel), Windows (x64), Linux (x64)

📥 Download Latest Release

🔄 Migration Notes

No breaking changes - All existing functionality preserved
Remote server disabled by default - Enable in Settings → Remote Server
Automatic migration - Settings and data migrate seamlessly

📝 Technical Details

39 files changed: +6,157 additions, -1,021 deletions
New dependencies: Fastify for HTTP server
Version: 0.2.1 → 0.2.2

🐛 Bug Fixes

Fixed Windows Fastify module loading
Resolved macOS codesign timestamp issues
Fixed architecture compatibility on macOS
Improved native extension handling

📚 Documentation

Remote Server Guide: docs/remote-server-phase-1.md
VNC Testing: .github/VNC_QUICK_START.md
Updated README with new features

🙏 Acknowledgments

Thanks to all contributors and users who provided feedback!

Pull Requests:

#176 - Visual kill switch for agent progress windows
#166 - Remote Server Phase 1: OpenAI-compatible HTTP API

Issues Closed:

#182 - Add total tools enabled counter
#181 - Build launch architecture error
#175 - Visual kill switch request

Full Changelog: v0.2.1...v0.2.2

Released: October 2025 | License: AGPL-3.0

Assets 9

12 Sep 23:49

aj47

v0.2.1

34a1526

v0.2.1

Full Changelog: v0.2.0...v0.2.1

Assets 15

Releases: aj47/SpeakMCP

SpeakMCP v1.4.0

SpeakMCP v1.4.0

New Features

Bug Fixes

Improvements

Downloads

Uh oh!

v1.3.0

SpeakMCP v1.3.0

🚀 New Features

🐛 Bug Fixes

⚡ Performance

🎨 UI Improvements

📦 Downloads

Uh oh!

v1.2.0

🎯 Major Features

🎭 Agent Personas & Multi-Agent Delegation (#920)

🤖 External ACP Agents (#894, #920)

🧠 Dual-Model Agent Mode (#919)

💾 Agent Memory System (#919, #963, #975)

🧠 Agent Skills System (#895, #958)

📊 Langfuse Observability Integration (#929, #941, #947)

🔗 Persistent Cloudflare Tunnel URLs (#922, #954)

📱 Mobile & Cross-Device Features (#962, #972)

🔧 Inter-Agent Communication (#959)

📱 WhatsApp Harness Improvements (#910, #905)

💾 Context Compaction & Memory Management (#908, #909)

🎮 Profile CRUD Tools (#938)

📐 Model Registry with Fuzzy Matching (#907)

🛠️ MCP Tool Discovery (#948, #950)

🚀 UX & UI Improvements

Session Tiles

Floating Panel

Sidebar & Navigation

Mobile App

Streamer Mode (#893)

🔧 API & Settings Improvements

Feature Toggle API (#939)

New Builtin Tools

Settings Reorganization

🐛 Bug Fixes

Agent Loop & Completion (#946, #967, #970)

Verification Logic (#951)

Voice & Recording

Model & Preset

Memory & Processes

📦 Downloads

macOS Builds (Signed)

Android

Linux

🔄 Migration Notes

Uh oh!

v1.1.0

🎯 Major Features

🤖 Vercel AI SDK Migration (#812)

📋 Kanban View for Sessions (#807)

📝 Predefined Prompts (#809)

📱 Mobile Settings Management (#744)

🔍 MCP Registry Integration (#785)

📤 Enhanced Profile Export/Import (#772)

🚀 Performance & UX Improvements

Recording Latency Reduction (#734)

Mobile Text Interaction (#735)

Session Management (#739, #740)

UI Polish (#733, #738, #800, #801, #806, #811)

Mobile Improvements (#794, #816)

🔧 Code Quality & Maintenance

Major Refactoring Sprint (#775, #776, #777, #778, #779, #780)

LLM Code Consolidation (#781)

Shared Package Improvements (#773)

Testing Infrastructure (#770)

Refactoring Issue Templates (#745)

🐛 Bug Fixes

LLM & Provider

Mobile & Desktop

UI/UX

📊 Stats

🔄 Migration Notes