Open
Conversation
Introduce the TTS subsystem with a pluggable adapter interface and an Edge TTS backend (via Microsoft's edge-readaloud WebSocket API). - TtsAdapter interface with Capabilities/Synthesize/Stream - Edge WebSocket client matching readest's protocol (one-shot connection, big-endian audio frames, Sec-MS-GEC token generation) - AudioConfig with go-playground/validator struct tags - ParamConstraint supporting both discrete options and continuous ranges - Unit tests with mock WebSocket server, integration tests behind -tags=integration with audio file output Made-with: Cursor
…ntegration - Add tts_providers table, CRUD API, and Edge TTS adapter with WebSocket synthesis - Add TTS provider management page with voice/format/speed/pitch config and test synthesis - Add config schema support on provider meta for extensibility - Add tts_provider_id to bot settings for per-bot TTS configuration - Fix unsupported Edge TTS formats (ogg, audio-16khz) and improve WS error handling - Add 500-char text limit on test synthesis (frontend + backend)
…eb playback - Add text_to_speech agent tool that LLM can invoke when user requests voice - Stream TTS audio to temp file (StreamToFile) to keep memory usage low - Persist voice attachments via media.Service (content-addressed, per-bot container) - Extract TTS voice from tool_call_end events in ChannelInboundProcessor - Conditionally enable TTS action in resolver when bot has tts_provider_id - Add inline audio player in web chat UI for voice/audio attachments - Make VoiceConfig fields optional so adapters can use their own defaults
- Resolve merge conflicts between tts and main branches - Convert TTS tool from custom TS implementation to Go MCP ToolExecutor (internal/mcp/providers/tts/provider.go), aligning with main's server-side tool gateway pattern - Merge both tts_provider and browser_context features in DB schema, queries, settings DTOs, and Vue bot-settings UI - Move TTS frontend pages from packages/web/ to apps/web/ to match main's directory structure - Remove obsolete client-side AllowedActions/AgentAction enums - Delete old TS TTS tool (now served via MCP) - Regenerate sqlc, swagger, and SDK Made-with: Cursor
Restore TTS voice attachment extraction/rendering across inbound and web chat, renumber TTS migrations to 0028/0029, and fix golangci-lint findings.
Refactor TTS adapter interface to expose per-model capabilities (DefaultModel, Models, ResolveModel) and route synthesis through a specific model. Add tts_models table, auto-import models on provider creation, full model CRUD API, and update bot settings to select a TTS model instead of a provider.
Collaborator
Author
|
Unfortunately, due to platforms not supporting streaming voice APIs, we are currently using blocking audio attachments instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
add TTS system as agent tool
Features
Related
#85