Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 7, 2025

This PR implements integration with Google Cloud Text-to-Speech and Microsoft Azure Speech Services as requested in #6827.

Summary of Changes

Core Implementation

  • Created a provider-based architecture for TTS services with a common interface
  • Implemented three TTS providers:
    • Native (existing OS-based TTS using the say package)
    • Google Cloud Text-to-Speech
    • Microsoft Azure Speech Services
  • Added TtsManager to coordinate between different providers and handle provider switching

UI Updates

  • Added provider selection dropdown in notification settings
  • Created configuration UI components for Google Cloud and Azure settings
  • Added secure input fields for API keys and configuration parameters

Settings & Configuration

  • Extended global settings to include TTS provider configuration
  • Added support for storing API keys and provider-specific settings
  • Updated message handlers to process new TTS-related settings

Key Features

  • Backward Compatibility: Default to native TTS, ensuring existing functionality remains intact
  • Provider Flexibility: Users can switch between providers based on their needs
  • Secure Configuration: API keys are handled securely through VSCode's state management
  • Voice Selection: Support for listing and selecting available voices from cloud providers
  • Error Handling: Graceful fallback to native TTS if cloud providers fail

Testing

  • All existing tests pass
  • Linting and type checking completed successfully
  • Manual testing recommended for cloud provider integration

Configuration Required

For Google Cloud TTS:

  1. Enable Text-to-Speech API in Google Cloud Console
  2. Create an API key
  3. Enter credentials in settings

For Azure Speech Services:

  1. Create Speech Services resource in Azure Portal
  2. Copy subscription key and region
  3. Enter credentials in settings

Fixes #6827


Important

Integrates Google Cloud and Azure TTS services with a provider-based architecture, UI updates for configuration, and secure handling of API keys.

  • Core Implementation:
    • Introduces a provider-based architecture for TTS services with a common interface.
    • Implements NativeTtsProvider, GoogleCloudTtsProvider, and AzureTtsProvider in TtsManager.
    • Adds TtsManager to manage providers and handle TTS operations.
  • UI Updates:
    • Adds TTS provider selection dropdown in NotificationSettings.tsx.
    • Implements GoogleCloudTtsSettings.tsx and AzureTtsSettings.tsx for provider-specific configurations.
  • Settings & Configuration:
    • Extends global-settings.ts to include TTS provider configurations and API keys.
    • Updates webviewMessageHandler.ts to handle TTS-related settings and provider initialization.
  • Key Features:
    • Ensures backward compatibility by defaulting to native TTS.
    • Allows users to switch between TTS providers.
    • Securely handles API keys using VSCode's state management.
    • Supports voice selection from cloud providers.
    • Provides error handling with fallback to native TTS if cloud providers fail.

This description was created by Ellipsis for 4112aa6. You can customize this summary. It will automatically update as commits are pushed.

- Add TTS provider interface and implementations for native, Google Cloud, and Azure
- Create TtsManager to coordinate between different TTS providers
- Update UI to allow provider selection and configuration
- Add settings for API keys and provider-specific configuration
- Maintain backward compatibility with existing native TTS functionality

Fixes #6827
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 7, 2025 23:28
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request UI/UX UI/UX related or focused labels Aug 7, 2025
} = state

// Initialize TTS manager with provider configuration
await initializeTts({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TTS initialization block, consider adding additional logging or error-handling around the initializeTts() call so that any configuration issues (e.g. missing API keys) are clearly logged.

await updateGlobalState("googleCloudTtsApiKey", googleCloudApiKey)
// Re-initialize TTS with new config
const gcState = await provider.getState()
await initializeTts({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new TTS configuration message handlers (e.g. for 'ttsProvider', 'googleCloudTtsApiKey', etc.), consider wrapping the re-initialization calls to initializeTts() in try/catch blocks. This will help prevent an unhandled error from crashing the handler and will allow a user‐friendly error message.

Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed my own code and found bugs I introduced 5 minutes ago. Classic.

ttsSpeed: z.number().optional(),
ttsProvider: z.enum(["native", "google-cloud", "azure"]).optional(),
ttsVoice: z.string().optional(),
googleCloudTtsApiKey: z.string().optional(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: These API keys should be stored using VSCode's SecretStorage API instead of global state. Storing sensitive credentials in global state could expose them through settings sync or exports.

Consider moving these to SecretStorage instead of adding them to the global settings schema.


try {
if (!this.activeProvider) {
await this.setActiveProvider("native")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error recovery here. If the active provider is null and setActiveProvider fails, we should have a fallback mechanism. Could we try falling back to native TTS instead of silently failing?

await provider.postStateToWebview()
if (message.value !== undefined) {
Terminal.setShellIntegrationTimeout(message.value)
Terminal.setShellIntegrationTimeout(Number(message.value))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type coercion without validation could result in NaN values. Should we validate the input before converting to ensure we don't store invalid numbers?

input: { text },
voice: {
languageCode: "en-US",
name: options?.voice || "en-US-Neural2-F",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded default voice might not be available. Should we validate against available voices first or handle the error if this voice doesn't exist?

const path = require("path")
const os = require("os")

const tempFile = path.join(os.tmpdir(), `tts-${Date.now()}.mp3`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak if an error occurs between creating the temp file and deleting it. Consider using a try-finally block to ensure cleanup happens even if playback fails.


return (
<div className="flex flex-col gap-4 p-4 border border-vscode-panel-border rounded">
<h4 className="text-sm font-semibold">Google Cloud TTS Configuration</h4>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing i18n support. These hardcoded strings should use the translation system for consistency with the rest of the application.


return (
<div className="flex flex-col gap-4 p-4 border border-vscode-panel-border rounded">
<h4 className="text-sm font-semibold">Azure Speech Services Configuration</h4>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing i18n support here as well. All user-facing strings should be translatable.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 8, 2025
@daniel-lxs
Copy link
Member

Closing, the author of the issue will implement it

@daniel-lxs daniel-lxs closed this Aug 11, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 11, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XXL This PR changes 1000+ lines, ignoring generated files. UI/UX UI/UX related or focused

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Feature: Integrate Google Cloud and Microsoft Azure Text-to-Speech Services

4 participants