Skip to content

feat(tts): introduce TTS system#195

Open
Fodesu wants to merge 7 commits intomemohai:mainfrom
Fodesu:tts
Open

feat(tts): introduce TTS system#195
Fodesu wants to merge 7 commits intomemohai:mainfrom
Fodesu:tts

Conversation

@Fodesu
Copy link
Collaborator

@Fodesu Fodesu commented Mar 6, 2026

Summary

add TTS system as agent tool

Features

  • Generic TTS layer
  • Web TTS Config
  • agent tool
  • edge-tts backend
  • channels tts support

Related

#85

Introduce the TTS subsystem with a pluggable adapter interface and an
Edge TTS backend (via Microsoft's edge-readaloud WebSocket API).

- TtsAdapter interface with Capabilities/Synthesize/Stream
- Edge WebSocket client matching readest's protocol (one-shot connection,
  big-endian audio frames, Sec-MS-GEC token generation)
- AudioConfig with go-playground/validator struct tags
- ParamConstraint supporting both discrete options and continuous ranges
- Unit tests with mock WebSocket server, integration tests behind
  -tags=integration with audio file output

Made-with: Cursor
Fodesu added 5 commits March 6, 2026 14:46
…ntegration

- Add tts_providers table, CRUD API, and Edge TTS adapter with WebSocket synthesis
- Add TTS provider management page with voice/format/speed/pitch config and test synthesis
- Add config schema support on provider meta for extensibility
- Add tts_provider_id to bot settings for per-bot TTS configuration
- Fix unsupported Edge TTS formats (ogg, audio-16khz) and improve WS error handling
- Add 500-char text limit on test synthesis (frontend + backend)
…eb playback

- Add text_to_speech agent tool that LLM can invoke when user requests voice
- Stream TTS audio to temp file (StreamToFile) to keep memory usage low
- Persist voice attachments via media.Service (content-addressed, per-bot container)
- Extract TTS voice from tool_call_end events in ChannelInboundProcessor
- Conditionally enable TTS action in resolver when bot has tts_provider_id
- Add inline audio player in web chat UI for voice/audio attachments
- Make VoiceConfig fields optional so adapters can use their own defaults
- Resolve merge conflicts between tts and main branches
- Convert TTS tool from custom TS implementation to Go MCP ToolExecutor
  (internal/mcp/providers/tts/provider.go), aligning with main's
  server-side tool gateway pattern
- Merge both tts_provider and browser_context features in DB schema,
  queries, settings DTOs, and Vue bot-settings UI
- Move TTS frontend pages from packages/web/ to apps/web/ to match
  main's directory structure
- Remove obsolete client-side AllowedActions/AgentAction enums
- Delete old TS TTS tool (now served via MCP)
- Regenerate sqlc, swagger, and SDK

Made-with: Cursor
Restore TTS voice attachment extraction/rendering across inbound and web chat, renumber TTS migrations to 0028/0029, and fix golangci-lint findings.
@Fodesu Fodesu marked this pull request as ready for review March 7, 2026 14:50
@Fodesu Fodesu requested review from chen-ran and sheepbox8646 March 7, 2026 14:50
Refactor TTS adapter interface to expose per-model capabilities
(DefaultModel, Models, ResolveModel) and route synthesis through
a specific model. Add tts_models table, auto-import models on
provider creation, full model CRUD API, and update bot settings
to select a TTS model instead of a provider.
@Fodesu
Copy link
Collaborator Author

Fodesu commented Mar 7, 2026

Unfortunately, due to platforms not supporting streaming voice APIs, we are currently using blocking audio attachments instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants