Add text-to-speech functionality #1412

heyseth · 2025-03-06T02:23:49Z

Context

Text-to-speech is a useful quality-of-life feature that allows users to listen to what Roo is doing while having another window open. This feature is also useful for auditory learners (such as myself), because it adds an alternative way for the user to absorb information without needing to read what Roo is doing.

Implementation

Abstracts text-to-speech functionality into src\utils\tts.ts
Uses the say.js npm package for cross-platform tts
Uses an utterance queue to ensure messages are read one at a time

Screenshots

How to Test

Navigate to the Roo settings page and make sure that "Enable text-to-speech" is checked under Notifications.
Ask Roo to complete a task. If everything is working correctly, Roo should read aloud its messages.

Get in Touch

Message me in the Roo Code Discord at the handle @ocean.smith

Important

Adds text-to-speech functionality using say.js, with settings and state management for enabling/disabling TTS, and integrates it into message handling.

Behavior:
- Adds text-to-speech (TTS) functionality using say.js in src/utils/tts.ts.
- Integrates TTS with message handling in ChatView.tsx to read aloud non-partial, non-JSON say messages.
- Adds TTS enable/disable settings in NotificationSettings.tsx and SettingsView.tsx.
State Management:
- Updates ExtensionStateContext.tsx to include ttsEnabled state management.
- Modifies ClineProvider.ts to handle TTS state and message types ttsEnabled and playTts.
Testing:
- Adds tests for TTS functionality in ClineProvider.test.ts to verify enabling/disabling TTS and message handling.

^{This description was created by}^{for 0d4a743. It will automatically update as commits are pushed.}

changeset-bot · 2025-03-06T02:23:54Z

⚠️ No Changeset found

Latest commit: 730548e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

webview-ui/src/components/settings/NotificationSettings.tsx

src/utils/tts.ts

hannesrudolph · 2025-03-06T04:14:04Z

What are the chances of adding some speech to text functionality in there while you're at it? ;)

heyseth · 2025-03-06T06:14:00Z

I would like to add speech to text in a separate pull request, plus more settings for voice volume, speed, voice type. If there is interest of course :-)

hannesrudolph · 2025-03-06T07:22:01Z

There absolutely is

heyseth · 2025-03-06T23:49:40Z

Turns out adding speech to text into Roo is less straightforward than I thought it would be. I think using something like whisper would provide the best quality local solution, but bundling this with Roo is not easy.

mrubens · 2025-03-08T02:44:22Z

webview-ui/src/components/chat/ChatView.tsx

+		// skip input message
+		if (lastMessage && messages.length > 1) {
+			let text = lastMessage?.text || ""
+
+			if (
+				lastMessage.type === "say" && // is a say message
+				!lastMessage.partial && // not a partial message
+				!text.startsWith("{") && // not a json object
+				text !== lastTtsRef.current // not the same as last TTS message
+			) {
+				try {
+					playTts(text)
+					lastTtsRef.current = text
+				} catch (error) {
+					console.error("Failed to execute text-to-speech:", error)
+				}
+			}
+		}


Do you mind explaining the logic here? Thank you!

Hello, essentially I only want to read out the messages which the user would expect Roo to read, ie: messages which appear in the chat interface. The first item in messages is the user input, which we don't need to read aloud. We also don't need to read aloud incomplete messages or json objects. The reason that I had it check if the message type is say is that I didn't want Roo reading aloud ask messages such as this:

Maybe this last behavior should be a toggleable option though?

The code also stores a reference to the last spoken message to prevent duplicate responses from being read.

I just pulled the branch down and it did seem to read my messages back to me - is that unintended? It does also still seem to read mermaid and json.

Pretty cool experience overall though!

@heyseth sorry I accidentally resolved this conversation somehow. Any ideas on my last question here?

@mrubens sorry for the late response! I'm writing some fixes now that should prevent the mermaid diagrams/json and user input messages from being read aloud

mrubens · 2025-03-09T02:12:10Z

webview-ui/src/context/ExtensionStateContext.tsx

 		filePaths,
 		openedTabs,
 		soundVolume: state.soundVolume,
+		ttsSpeed: state.ttsSpeed,


Do we need ttsEnabled in here too?

I don't think so, the contextValue object is built using the spread operator on the whole state (which already includes ttsEnabled). I only put ttsSpeed in because I noticed that soundVolume was there, but it looks like neither of those are actually needed.

heyseth · 2025-03-11T01:42:33Z

What changes remain to be made to make this production ready?

mrubens · 2025-03-11T01:56:06Z

What changes remain to be made to make this production ready?

I asked a question in a thread above that I accidentally resolved earlier - sorry about that.

…-Code into feature/textToSpeech

heyseth · 2025-03-17T22:44:00Z

@mrubens I've modified the logic in ChatView.tsx for reading aloud messages. It should only read aloud regular messages from Roo now, skipping over user input messages, json objects, and mermaid diagrams. Would you mind testing the feature again?

mrubens · 2025-03-18T00:00:44Z

@mrubens I've modified the logic in ChatView.tsx for reading aloud messages. It should only read aloud regular messages from Roo now, skipping over user input messages, json objects, and mermaid diagrams. Would you mind testing the feature again?

Great! Will take a look tonight.

mrubens

Awesome, seems to work great now!

Add text-to-speech functionality

0d4a743

heyseth requested review from cte and mrubens as code owners March 6, 2025 02:23

github-project-automation bot added this to Roo Code Roadmap Mar 6, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Mar 6, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Mar 6, 2025

ellipsis-dev bot reviewed Mar 6, 2025

View reviewed changes

webview-ui/src/components/settings/NotificationSettings.tsx Outdated Show resolved Hide resolved

src/utils/tts.ts Show resolved Hide resolved

hannesrudolph moved this from New to PR [Unverified] in Roo Code Roadmap Mar 6, 2025

Merge branch 'RooVetGit:main' into feature/textToSpeech

4cd5545

Merge branch 'RooVetGit:main' into feature/textToSpeech

8d98ce6

mrubens reviewed Mar 8, 2025

View reviewed changes

heyseth added 4 commits March 8, 2025 12:45

Merge branch 'RooVetGit:main' into feature/textToSpeech

1a47e9d

Add speed config option to text-to-speech

88cf106

Fix test case for tts speed slider

8f19387

Fix test case for tts speed slider (really)

a734d51

mrubens reviewed Mar 9, 2025

View reviewed changes

heyseth added 4 commits March 8, 2025 19:32

Disabled error message logging in tts.ts

be9e57e

Merge branch 'RooVetGit:main' into feature/textToSpeech

409d67c

Merge branch 'RooVetGit:main' into feature/textToSpeech

da8a98c

Merge branch 'RooVetGit:main' into feature/textToSpeech

0b716f2

heyseth added 2 commits March 16, 2025 16:38

Merge branch 'RooVetGit:main' into feature/textToSpeech

2223762

ignore markdown and mermaid diagrams in TTS

1b6b830

heyseth added 4 commits March 17, 2025 14:58

Merge branch 'feature/textToSpeech' of https://github.com/heyseth/Roo…

b4eed3f

…-Code into feature/textToSpeech

add ttsEnabled and ttsSpeed to GlobalStateKey

552022d

Merge remote-tracking branch 'upstream/main' into feature/textToSpeech

5f32cb9

fix failing webview test for save button

1d7de4b

mrubens added 3 commits March 18, 2025 01:20

Merge remote-tracking branch 'origin/main' into feature/textToSpeech

63d6d64

Translations

d867e0e

Fix tests

730548e

mrubens approved these changes Mar 18, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 18, 2025

mrubens merged commit f16a49d into RooCodeInc:main Mar 18, 2025
10 checks passed

github-project-automation bot moved this from PR [Pre Approval Review] to Done in Roo Code Roadmap Mar 18, 2025

heyseth deleted the feature/textToSpeech branch March 22, 2025 06:55

Add text-to-speech functionality #1412

Add text-to-speech functionality #1412

Uh oh!

Conversation

heyseth commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Implementation

Screenshots

How to Test

Get in Touch

Uh oh!

changeset-bot bot commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Uh oh!

Uh oh!

hannesrudolph commented Mar 6, 2025

Uh oh!

heyseth commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannesrudolph commented Mar 6, 2025

Uh oh!

heyseth commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrubens Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

heyseth Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrubens Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrubens Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

heyseth Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

mrubens Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

heyseth Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heyseth commented Mar 11, 2025

Uh oh!

mrubens commented Mar 11, 2025

Uh oh!

heyseth commented Mar 17, 2025

Uh oh!

mrubens commented Mar 18, 2025

Uh oh!

mrubens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

heyseth commented Mar 6, 2025 •

edited

Loading

changeset-bot bot commented Mar 6, 2025 •

edited

Loading

heyseth commented Mar 6, 2025 •

edited

Loading

heyseth commented Mar 6, 2025 •

edited

Loading

heyseth Mar 8, 2025 •

edited

Loading

mrubens Mar 9, 2025 •

edited

Loading

heyseth Mar 9, 2025 •

edited

Loading