Skip to content

Commit 7a9b893

Browse files
committed
minor fix
1 parent 57a1ba6 commit 7a9b893

File tree

4 files changed

+54
-56
lines changed

4 files changed

+54
-56
lines changed

README.md

Lines changed: 45 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -80,49 +80,36 @@ then go to chrome://extensions/ and load unpacked the extensions/chrome/ dir
8080

8181
## Demo Video
8282

83-
[![TalkiTo Demo](https://img.youtube.com/vi/FJdYTYZK_0U/0.jpg)](https://youtu.be/FJdYTYZK_0U)
83+
[![TalkiTo Demo](https://img.youtube.com/vi/pf8jFt0smqs/0.jpg)](https://youtu.be/pf8jFt0smqs)
8484

8585
## AI Assistant Compatibility
8686

87-
| AI Assistant | Method | Status |
88-
|------------------------------|---------------|---------------------|
89-
| **Claude Code** | Terminal | **Fully Supported** |
90-
| **Codex Cli** | Terminal | **Fully Supported** |
91-
| bolt.new | Web Extension | Output Only |
92-
| v0.dev | Web Extension | Output Only |
93-
| replit.com | Web Extension | Output Only |
94-
| Gemini CLI | Terminal | In Progress |
95-
| Aider | Terminal | In Progress |
96-
| Cursor | Terminal | In Progress |
97-
| Continue | Terminal | In Progress |
87+
| AI Assistant | Method | Status |
88+
|-----------------|---------------|---------------------|
89+
| **Claude Code** | Terminal | **Fully Supported** |
90+
| **Codex Cli** | Terminal | **Fully Supported** |
91+
| bolt.new | Web Extension | Output Only |
92+
| v0.dev | Web Extension | Output Only |
93+
| replit.com | Web Extension | Output Only |
94+
| Other agents | Terminal | In Progress |
9895

9996

10097

101-
### Voice Mode
98+
### Run with Claude Code
10299

103-
When you run `talkito claude`, voice mode is enabled by default:
100+
run `talkito claude`
104101

105-
1. **Automatic voice interaction**: Claude will:
106-
- Speak all responses using TTS
107-
- Listen for your voice input after speaking
108-
- Process your speech as the next user message
109-
- Continue this loop automatically
102+
### Run with Codex Cli
110103

111-
2. **Control voice mode**:
112-
- Voice mode starts ON by default
113-
- Say or type "turn off talkito" to disable voice interaction
114-
- Say or type "turn on talkito" to re-enable if turned off
104+
run `talkito codex`
115105

116-
3. **Unified input handling**: All inputs are processed as user messages:
117-
- Voice dictation: Your spoken words
118-
- Slack messages: From configured channels
119-
- WhatsApp messages: From configured numbers
106+
### Run as an MCP server
120107

121-
4. **Communication modes**:
122-
- Say "start slack mode #channel-name" to auto-send responses to Slack
123-
- Say "start whatsapp mode +1234567890" to auto-send responses to WhatsApp
124-
- Say "stop slack/whatsapp mode" to disable
108+
run `talkito --mcp-server`
125109

110+
### Run the TalkiTo configuration menu
111+
112+
run `talkito`
126113

127114
#### Advanced Options
128115

@@ -131,13 +118,16 @@ When you run `talkito claude`, voice mode is enabled by default:
131118
talkito --dont-auto-skip-tts claude
132119

133120
# Use different TTS providers
134-
talkito --tts-provider openai --tts-voice nova echo "Hello with OpenAI"
135121
talkito --tts-provider polly --tts-voice Matthew --tts-region us-west-2 echo "Hello with AWS"
136122
talkito --tts-provider azure --tts-voice en-US-JennyNeural echo "Hello with Azure"
137123
talkito --tts-provider gcloud --tts-voice en-US-Journey-F echo "Hello with Google"
124+
talkito --tts-provider kittentts --tts-voice expr-voice-3-f echo "Hello with KittenTTS"
125+
talkito --tts-provider kokoro --tts-voice af_heart echo "Hello with Kokoro (local)"
138126

139127
# Use different ASR providers
140128
talkito --asr-provider gcloud --asr-language en-US claude
129+
AZURE_SPEECH_KEY=... AZURE_SPEECH_REGION=eastus talkito --asr-provider azure claude
130+
WHISPER_MODEL=small WHISPER_COMPUTE_TYPE=int8 talkito --asr-provider local_whisper claude
141131
talkito --asr-language es-ES echo "Hola mundo" # Spanish recognition
142132

143133
# Enable remote communication (configure via environment variables)
@@ -191,26 +181,6 @@ except KeyboardInterrupt:
191181
asr.stop_dictation()
192182
```
193183

194-
### MCP Server Usage
195-
196-
Talkito includes an MCP (Model Context Protocol) server that allows AI applications to use TTS and ASR capabilities:
197-
198-
```bash
199-
# Install TalkiTo (includes MCP support)
200-
pip install talkito
201-
202-
# Run as MCP server
203-
talkito --mcp-server
204-
```
205-
206-
The MCP server provides tools for:
207-
- **Core**: `turn_on`/`turn_off` (enable voice mode), `get_talkito_status`
208-
- **TTS**: `enable_tts`/`disable_tts`, `speak_text`, `skip_current_speech`, `configure_tts`
209-
- **ASR**: `enable_asr`/`disable_asr`, `start_voice_input`/`stop_voice_input`, `get_dictated_text`
210-
- **Communication**: `start_whatsapp_mode`/`stop_whatsapp_mode`, `start_slack_mode`/`stop_slack_mode`, `send_whatsapp`, `send_slack`, `get_messages`
211-
212-
Configure your AI application to connect to the talkito MCP server for voice capabilities.
213-
214184
## Provider Configuration
215185

216186
### Text-to-Speech (TTS) Providers
@@ -255,6 +225,18 @@ Configure your AI application to connect to the talkito MCP server for voice cap
255225
- **Voices**: aura-asteria-en, aura-luna-en, aura-stella-en, and more
256226
- **Usage**: `--tts-provider deepgram --tts-voice aura-asteria-en`
257227

228+
#### KittenTTS (Local / Offline)
229+
- **Install**: `pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl soundfile phonemizer`
230+
- **Setup**: No API key required. First run prompts to download the selected model (default `kitten-tts-nano-0.2`) into the Hugging Face cache. Configure `KITTENTTS_MODEL` and `KITTENTTS_VOICE` to pick different quality/voice options.
231+
- **Best for**: Ultra-lightweight CPU-only voices that stay on-device.
232+
- **Usage**: `KITTENTTS_MODEL=kitten-tts-nano-0.2 talkito --tts-provider kittentts --tts-voice expr-voice-3-f`
233+
234+
#### Kokoro (Local / Offline)
235+
- **Install**: `pip install 'kokoro>=0.9.4' soundfile phonemizer`
236+
- **Setup**: No API key required. TalkiTo will download Kokoro weights the first time you run it (set `KOKORO_LANGUAGE`, `KOKORO_VOICE`, `KOKORO_SPEED` to control defaults).
237+
- **Best for**: High-quality multilingual voices without sending audio to a cloud provider.
238+
- **Usage**: `talkito --tts-provider kokoro --tts-voice af_heart --tts-language en-US`
239+
258240
### Automatic Speech Recognition (ASR) Providers
259241

260242
#### Google Speech Recognition (Default)
@@ -292,6 +274,17 @@ Configure your AI application to connect to the talkito MCP server for voice cap
292274
- **Features**: Streaming transcription
293275
- **Usage**: `--asr-provider aws --aws-region us-west-2`
294276

277+
#### Azure Speech Services
278+
- **Get API Key**: https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/
279+
- **Setup**: Set `AZURE_SPEECH_KEY` and `AZURE_SPEECH_REGION`, then `pip install azure-cognitiveservices-speech`
280+
- **Features**: Low-latency streaming dictation with automatic punctuation
281+
- **Usage**: `AZURE_SPEECH_KEY=... AZURE_SPEECH_REGION=eastus talkito --asr-provider azure`
282+
283+
#### Local Whisper (On-Device)
284+
- **Install**: `pip install faster-whisper` (default) or `WHISPER_COREML=1 pip install pywhispercpp` for Apple Silicon/CoreML acceleration
285+
- **Setup**: No API key required. Configure `WHISPER_MODEL` (e.g., `small`, `medium`), `WHISPER_DEVICE` (`cpu`, `cuda`, or `mps`), and `WHISPER_COMPUTE_TYPE` (`int8`, `int8_float16`, etc.). Models are cached locally and TalkiTo will prompt before downloading unless `TALKITO_AUTO_APPROVE_DOWNLOADS=1`.
286+
- **Usage**: `WHISPER_MODEL=small WHISPER_COMPUTE_TYPE=int8 talkito --asr-provider local_whisper`
287+
295288
### Communication Providers (Remote Interaction)
296289

297290
#### Twilio SMS

talkito/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,6 @@
1616

1717
"""Version information for talkito package"""
1818

19-
__version__ = "0.1.3"
19+
__version__ = "1.0.0"
2020
__author__ = "Robert Macrae"
2121
__license__ = "AGPL-3.0-or-later"

talkito/asr.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1529,7 +1529,7 @@ def stream(self, engine, microphone) -> None:
15291529

15301530
try:
15311531
# Get the cached model with timeout
1532-
model = get_cached_local_whisper_model(timeout_seconds=10.0)
1532+
model = get_cached_local_whisper_model(timeout_seconds=60.0)
15331533
if model is None:
15341534
raise ValueError("Failed to load local whisper model within timeout")
15351535

talkito/tts.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2171,8 +2171,13 @@ def tts_worker(engine: str):
21712171

21722172
log_message("DEBUG", f"{is_currently_speaking=} {is_currently_playing=} {playing_long_enough=} {time_playing=}")
21732173

2174-
# Only skip current item if something is actually speaking AND a process exists AND it's not an exception
2175-
if is_currently_speaking and is_currently_playing and (not current_speech_item or not current_speech_item.is_exception):
2174+
# Only skip current item if something is actually speaking, a process exists, and neither item is an exception
2175+
current_is_exception = bool(current_speech_item and current_speech_item.is_exception)
2176+
2177+
if current_is_exception:
2178+
log_message("INFO", "Current speech item is an exception; skipping auto-skip")
2179+
2180+
if is_currently_speaking and is_currently_playing and not current_is_exception:
21762181
needs_skip = True
21772182
log_message("INFO", f"Auto-skipping current audio for new text ({len(text_to_speak)} chars)")
21782183

0 commit comments

Comments
 (0)