Skip to content

Conversation

@heyseth
Copy link
Contributor

@heyseth heyseth commented Mar 6, 2025

Context

Text-to-speech is a useful quality-of-life feature that allows users to listen to what Roo is doing while having another window open. This feature is also useful for auditory learners (such as myself), because it adds an alternative way for the user to absorb information without needing to read what Roo is doing.

Implementation

  • Abstracts text-to-speech functionality into src\utils\tts.ts
  • Uses the say.js npm package for cross-platform tts
  • Uses an utterance queue to ensure messages are read one at a time

Screenshots

image

How to Test

  • Navigate to the Roo settings page and make sure that "Enable text-to-speech" is checked under Notifications.
  • Ask Roo to complete a task. If everything is working correctly, Roo should read aloud its messages.

Get in Touch

Message me in the Roo Code Discord at the handle @ocean.smith


Important

Adds text-to-speech functionality using say.js, with settings and state management for enabling/disabling TTS, and integrates it into message handling.

  • Behavior:
    • Adds text-to-speech (TTS) functionality using say.js in src/utils/tts.ts.
    • Integrates TTS with message handling in ChatView.tsx to read aloud non-partial, non-JSON say messages.
    • Adds TTS enable/disable settings in NotificationSettings.tsx and SettingsView.tsx.
  • State Management:
    • Updates ExtensionStateContext.tsx to include ttsEnabled state management.
    • Modifies ClineProvider.ts to handle TTS state and message types ttsEnabled and playTts.
  • Testing:
    • Adds tests for TTS functionality in ClineProvider.test.ts to verify enabling/disabling TTS and message handling.

This description was created by Ellipsis for 0d4a743. It will automatically update as commits are pushed.

@changeset-bot
Copy link

changeset-bot bot commented Mar 6, 2025

⚠️ No Changeset found

Latest commit: 730548e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Mar 6, 2025
@hannesrudolph
Copy link
Collaborator

What are the chances of adding some speech to text functionality in there while you're at it? ;)

@hannesrudolph hannesrudolph moved this from New to PR [Unverified] in Roo Code Roadmap Mar 6, 2025
@heyseth
Copy link
Contributor Author

heyseth commented Mar 6, 2025

I would like to add speech to text in a separate pull request, plus more settings for voice volume, speed, voice type. If there is interest of course :-)

@hannesrudolph
Copy link
Collaborator

There absolutely is

@heyseth
Copy link
Contributor Author

heyseth commented Mar 6, 2025

Turns out adding speech to text into Roo is less straightforward than I thought it would be. I think using something like whisper would provide the best quality local solution, but bundling this with Roo is not easy.

Comment on lines 669 to 686
// skip input message
if (lastMessage && messages.length > 1) {
let text = lastMessage?.text || ""

if (
lastMessage.type === "say" && // is a say message
!lastMessage.partial && // not a partial message
!text.startsWith("{") && // not a json object
text !== lastTtsRef.current // not the same as last TTS message
) {
try {
playTts(text)
lastTtsRef.current = text
} catch (error) {
console.error("Failed to execute text-to-speech:", error)
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind explaining the logic here? Thank you!

Copy link
Contributor Author

@heyseth heyseth Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, essentially I only want to read out the messages which the user would expect Roo to read, ie: messages which appear in the chat interface. The first item in messages is the user input, which we don't need to read aloud. We also don't need to read aloud incomplete messages or json objects. The reason that I had it check if the message type is say is that I didn't want Roo reading aloud ask messages such as this:

image

Maybe this last behavior should be a toggleable option though?

The code also stores a reference to the last spoken message to prevent duplicate responses from being read.

Copy link
Collaborator

@mrubens mrubens Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pulled the branch down and it did seem to read my messages back to me - is that unintended? It does also still seem to read mermaid and json.

Pretty cool experience overall though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heyseth sorry I accidentally resolved this conversation somehow. Any ideas on my last question here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrubens sorry for the late response! I'm writing some fixes now that should prevent the mermaid diagrams/json and user input messages from being read aloud

filePaths,
openedTabs,
soundVolume: state.soundVolume,
ttsSpeed: state.ttsSpeed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ttsEnabled in here too?

Copy link
Contributor Author

@heyseth heyseth Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the contextValue object is built using the spread operator on the whole state (which already includes ttsEnabled). I only put ttsSpeed in because I noticed that soundVolume was there, but it looks like neither of those are actually needed.

@heyseth
Copy link
Contributor Author

heyseth commented Mar 11, 2025

What changes remain to be made to make this production ready?

@mrubens
Copy link
Collaborator

mrubens commented Mar 11, 2025

What changes remain to be made to make this production ready?

I asked a question in a thread above that I accidentally resolved earlier - sorry about that.

@heyseth
Copy link
Contributor Author

heyseth commented Mar 17, 2025

@mrubens I've modified the logic in ChatView.tsx for reading aloud messages. It should only read aloud regular messages from Roo now, skipping over user input messages, json objects, and mermaid diagrams. Would you mind testing the feature again?

@mrubens
Copy link
Collaborator

mrubens commented Mar 18, 2025

@mrubens I've modified the logic in ChatView.tsx for reading aloud messages. It should only read aloud regular messages from Roo now, skipping over user input messages, json objects, and mermaid diagrams. Would you mind testing the feature again?

Great! Will take a look tonight.

Copy link
Collaborator

@mrubens mrubens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, seems to work great now!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 18, 2025
@mrubens mrubens merged commit f16a49d into RooCodeInc:main Mar 18, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from PR [Pre Approval Review] to Done in Roo Code Roadmap Mar 18, 2025
@heyseth heyseth deleted the feature/textToSpeech branch March 22, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants