Skip to content

Commit f23cdb5

Browse files
authored
Merge pull request #43 from OHF-Voice/synesthesiam-20250619-tts-streaming
1.7.0
2 parents 9682a74 + de392fb commit f23cdb5

36 files changed

+763
-57
lines changed

.github/workflows/test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ jobs:
1717
runs-on: ubuntu-latest
1818
strategy:
1919
matrix:
20-
python_version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
20+
python_version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
2121
steps:
2222
- uses: actions/checkout@v4.1.1
2323
- uses: actions/setup-python@v5
2424
with:
2525
python-version: "${{ matrix.python_version }}"
2626
cache: "pip"
2727
cache-dependency-path: pyproject.toml
28-
- run: script/setup --dev
28+
- run: script/setup --dev --http --zeroconf
2929
- run: script/test

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
# Changelog
22

3+
## 1.7.0
4+
5+
- Add streaming to `tts`, `handle`, and `asr`
6+
- Add `context` to more events
7+
- Add more tests
8+
9+
## 1.6.2
10+
11+
- Fix http requirements
12+
- Add missing http conf files
13+
314
## 1.6.1
415

516
- Migrate to pyproject.toml

README.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ Describe available services.
9191
* `installed` - true if currently installed (bool, required)
9292
* `description` - human-readable description (string, optional)
9393
* `version` - version of the model (string, optional)
94+
* `supports_transcript_streaming` - true if program can stream transcript chunks
9495
* `tts` - list text to speech services (optional)
9596
* `models` - list of available models
9697
* `name` - unique name (required)
@@ -103,6 +104,7 @@ Describe available services.
103104
* `installed` - true if currently installed (bool, required)
104105
* `description` - human-readable description (string, optional)
105106
* `version` - version of the model (string, optional)
107+
* `supports_synthesize_streaming` - true if program can stream text chunks
106108
* `wake` - list wake word detection services( optional )
107109
* `models` - list of available models (required)
108110
* `name` - unique name (required)
@@ -123,6 +125,7 @@ Describe available services.
123125
* `installed` - true if currently installed (bool, required)
124126
* `description` - human-readable description (string, optional)
125127
* `version` - version of the model (string, optional)
128+
* `supports_handled_streaming` - true if program can stream response chunks
126129
* `intent` - list intent recognition services (optional)
127130
* `models` - list of available models (required)
128131
* `name` - unique name (required)
@@ -160,8 +163,19 @@ Transcribe audio into text.
160163
* `context` - context from previous interactions (object, optional)
161164
* `transcript` - response with transcription
162165
* `text` - text transcription of spoken audio (string, required)
166+
* `language` - language of transcript (string, optional)
163167
* `context` - context for next interaction (object, optional)
164168

169+
Streaming:
170+
171+
1. `transcript-start` - starts stream
172+
* `language` - language of transcript (string, optional)
173+
* `context` - context from previous interactions (object, optional)
174+
2. `transcript-chunk`
175+
* `text` - part of transcript (string, required)
176+
3. Original `transcript` event must be sent for backwards compatibility
177+
4. `transcript-stop` - end of stream
178+
165179
### Text to Speech
166180

167181
Synthesize audio from text.
@@ -172,6 +186,20 @@ Synthesize audio from text.
172186
* `name` - name of voice (string, optional)
173187
* `language` - language of voice (string, optional)
174188
* `speaker` - speaker of voice (string, optional)
189+
190+
Streaming:
191+
192+
1. `synthesize-start` - starts stream
193+
* `context` - context from previous interactions (object, optional)
194+
* `voice` - use a specific voice (optional)
195+
* `name` - name of voice (string, optional)
196+
* `language` - language of voice (string, optional)
197+
* `speaker` - speaker of voice (string, optional)
198+
2. `synthesize-chunk`
199+
* `text` - part of text to synthesize (string, required)
200+
3. Original `synthesize` message must be sent for backwards compatibility
201+
4. `synthesize-stop` - end of stream, final audio must be sent
202+
5. `synthesize-stopped` - sent back to server after final audio
175203

176204
### Wake Word
177205

@@ -222,6 +250,15 @@ Handle structured intents or text directly.
222250
* `text` - response for user (string, optional)
223251
* `context` - context for next interactions (object, optional)
224252

253+
Streaming:
254+
255+
1. `handled-start` - starts stream
256+
* `context` - context from previous interactions (object, optional)
257+
2. `handled-chunk`
258+
* `text` - part of response (string, required)
259+
3. Original `handled` message must be sent for backwards compatibility
260+
4. `handled-stop` - end of stream
261+
225262
### Audio Output
226263

227264
Play audio stream.
@@ -295,8 +332,23 @@ Pipelines are run on the server, but can be triggered remotely from the server a
295332
3. → `audio-chunk` (required)
296333
* Send audio chunks until silence is detected
297334
4. → `audio-stop` (required)
298-
5. ← `transcript`
335+
5. ← `transcript` (required)
299336
* Contains text transcription of spoken audio
337+
338+
Streaming:
339+
340+
1. → `transcribe` event (optional)
341+
2. → `audio-start` (required)
342+
3. → `audio-chunk` (required)
343+
* Send audio chunks until silence is detected
344+
4. ← `transcript-start` (required)
345+
5. ← `transcript-chunk` (required)
346+
* Send transcript chunks as they're produced
347+
6. → `audio-stop` (required)
348+
7. ← `transcript` (required)
349+
* Sent for backwards compatibility
350+
8. ← `transcript-stop` (required)
351+
300352

301353
### Text to Speech
302354

@@ -306,6 +358,22 @@ Pipelines are run on the server, but can be triggered remotely from the server a
306358
* One or more audio chunks
307359
4. ← `audio-stop`
308360

361+
Streaming:
362+
363+
1. → `synthesize-start` event (required)
364+
3. → `synthesize-chunk` event (required)
365+
* Text chunks are sent as they're produced
366+
3. ← `audio-start`, `audio-chunk` (one or more), `audio-stop`
367+
* Audio chunks are sent as they're produced with start/stop
368+
4. → `synthesize` event
369+
* Sent for backwards compatibility
370+
5. → `synthesize-stop` event
371+
* End of text stream
372+
6. ← Final audio must be sent
373+
* `audio-start`, `audio-chunk` (one or more), `audio-stop`
374+
7. ← `synthesize-stopped`
375+
* Tells server that final audio has been sent
376+
309377
### Wake Word Detection
310378

311379
1. → `detect` event with `names` of wake words to detect (optional)
@@ -348,6 +416,16 @@ For text only:
348416
2. ← `handled` if successful
349417
3. ← `not-handled` if not successful
350418

419+
Streaming text only (successful):
420+
421+
1. → `transcript` with `text` to handle (required)
422+
2. ← `handled-start` (required)
423+
3. ← `handled-chunk` (required)
424+
* Chunk of response text
425+
4. ← `handled` (required)
426+
* Sent for backwards compatibility
427+
5. ← `handled-stop` (required)
428+
351429
### Audio Output
352430

353431
1. → `audio-start` (required)

pylintrc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@ disable=
3131
missing-class-docstring,
3232
missing-function-docstring,
3333
import-error,
34-
consider-using-with
34+
consider-using-with,
35+
too-many-positional-arguments
3536

3637
[FORMAT]
3738
expected-line-ending-format=LF

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[project]
22
name = "wyoming"
3-
version = "1.6.2"
3+
version = "1.7.0"
44
description = "Peer-to-peer protocol for voice assistants"
55
readme = "README.md"
6-
requires-python = ">=3.8.1,<3.13"
6+
requires-python = ">=3.8.1,<3.14"
77
license = {text = "MIT"}
88
authors = [
99
{name = "Michael Hansen", email = "mike@rhasspy.org"}
@@ -18,6 +18,7 @@ classifiers = [
1818
"Programming Language :: Python :: 3.10",
1919
"Programming Language :: Python :: 3.11",
2020
"Programming Language :: Python :: 3.12",
21+
"Programming Language :: Python :: 3.13",
2122
]
2223

2324
[project.urls]

script/format

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@ _VENV_DIR = _PROGRAM_DIR / ".venv"
99
_MODULE_DIR = _PROGRAM_DIR / "wyoming"
1010
_TESTS_DIR = _PROGRAM_DIR / "tests"
1111

12-
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
13-
subprocess.check_call(
14-
[context.env_exe, "-m", "black", str(_MODULE_DIR), str(_TESTS_DIR)]
15-
)
16-
subprocess.check_call(
17-
[context.env_exe, "-m", "isort", str(_MODULE_DIR), str(_TESTS_DIR)]
18-
)
12+
if _VENV_DIR.exists():
13+
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
14+
python_exe = context.env_exe
15+
else:
16+
python_exe = "python3"
17+
18+
subprocess.check_call([python_exe, "-m", "black", str(_MODULE_DIR), str(_TESTS_DIR)])
19+
subprocess.check_call([python_exe, "-m", "isort", str(_MODULE_DIR), str(_TESTS_DIR)])

script/lint

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,18 @@ _VENV_DIR = _PROGRAM_DIR / ".venv"
99
_MODULE_DIR = _PROGRAM_DIR / "wyoming"
1010
_TESTS_DIR = _PROGRAM_DIR / "tests"
1111

12-
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
13-
subprocess.check_call(
14-
[context.env_exe, "-m", "black", str(_MODULE_DIR), str(_TESTS_DIR), "--check"]
15-
)
16-
subprocess.check_call(
17-
[context.env_exe, "-m", "isort", str(_MODULE_DIR), str(_TESTS_DIR), "--check"]
18-
)
19-
subprocess.check_call(
20-
[context.env_exe, "-m", "flake8", str(_MODULE_DIR), str(_TESTS_DIR)]
21-
)
12+
if _VENV_DIR.exists():
13+
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
14+
python_exe = context.env_exe
15+
else:
16+
python_exe = "python3"
17+
2218
subprocess.check_call(
23-
[context.env_exe, "-m", "pylint", str(_MODULE_DIR), str(_TESTS_DIR)]
19+
[python_exe, "-m", "black", str(_MODULE_DIR), str(_TESTS_DIR), "--check"]
2420
)
2521
subprocess.check_call(
26-
[context.env_exe, "-m", "mypy", str(_MODULE_DIR), str(_TESTS_DIR)]
22+
[python_exe, "-m", "isort", str(_MODULE_DIR), str(_TESTS_DIR), "--check"]
2723
)
24+
subprocess.check_call([python_exe, "-m", "flake8", str(_MODULE_DIR), str(_TESTS_DIR)])
25+
subprocess.check_call([python_exe, "-m", "pylint", str(_MODULE_DIR), str(_TESTS_DIR)])
26+
subprocess.check_call([python_exe, "-m", "mypy", str(_MODULE_DIR), str(_TESTS_DIR)])

script/package

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ _DIR = Path(__file__).parent
77
_PROGRAM_DIR = _DIR.parent
88
_VENV_DIR = _PROGRAM_DIR / ".venv"
99

10-
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
11-
subprocess.check_call(
12-
[context.env_exe, "-m", "build", "--wheel", "--sdist"]
13-
)
10+
if _VENV_DIR.exists():
11+
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
12+
python_exe = context.env_exe
13+
else:
14+
python_exe = "python3"
15+
16+
subprocess.check_call([python_exe, "-m", "build", "--wheel", "--sdist"])

script/test

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,10 @@ _PROGRAM_DIR = _DIR.parent
99
_VENV_DIR = _PROGRAM_DIR / ".venv"
1010
_TEST_DIR = _PROGRAM_DIR / "tests"
1111

12-
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
13-
subprocess.check_call([context.env_exe, "-m", "pytest", _TEST_DIR] + sys.argv[1:])
12+
if _VENV_DIR.exists():
13+
context = venv.EnvBuilder().ensure_directories(_VENV_DIR)
14+
python_exe = context.env_exe
15+
else:
16+
python_exe = "python3"
17+
18+
subprocess.check_call([python_exe, "-m", "pytest", _TEST_DIR] + sys.argv[1:])

tests/test_audio.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
"""Test audio utilities."""
2+
23
import io
34
import wave
45

0 commit comments

Comments
 (0)