Skip to content

Commit a45369e

Browse files
committed
release v4
1 parent 673e66a commit a45369e

File tree

4 files changed

+25
-7
lines changed

4 files changed

+25
-7
lines changed

coagulator/coagulator.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
import websockets.asyncio.server
1717

1818
def g(): pass #globals
19-
g.provider_rev = 3
20-
g.user_rev = 3
19+
g.provider_rev = 4
20+
g.user_rev = 4
2121
g.next_client_id = 1
2222
g.next_web_id = 10000000
2323

readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Speech To Audio Relay (STAR)
2-
[Download windows client](https://github.com/samtupy/star/releases/latest/download/STAR_win64_v3.zip)
2+
[Download windows client](https://github.com/samtupy/star/releases/latest/download/STAR_win64_v4.zip)
33

44
This is a set of components intended to ease the creation of audio productions that involve the synthesis of text to speech to audio, particularly where many voices that might be contained on any number of different computers or devices are involved.
55

user/STAR.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
import websockets.uri
2020
import wx
2121

22-
USER_REVISION = 3
22+
USER_REVISION = 4
2323

2424
speech = accessible_output2.outputs.auto.Auto()
2525
sound_output=output.Output(0)
@@ -354,6 +354,7 @@ def __init__(self, parent = None):
354354
self.websocket = None
355355
self.script_continuous_preview = False
356356
self.render_total = 0
357+
self.last_renderable_lines = []
357358
self.Show()
358359
self.Centre()
359360
sizer = wx.BoxSizer(wx.VERTICAL)
@@ -526,13 +527,14 @@ def on_render(self, evt):
526527
self.render_output_path_tmp = tempfile.TemporaryDirectory()
527528
self.render_output_path = self.render_output_path_tmp.name
528529
if (not "clear_output_on_render" in config or config.as_bool("clear_output_on_render")) and self.render_title.Value:
529-
[os.remove(i) for i in glob.glob(os.path.join(config.get("render_path", os.path.join(os.getcwd(), "output")), self.render_title.Value, "*.wav"))]
530+
[os.remove(i) for i in glob.glob(os.path.join(config.get("render_path", os.path.join(os.getcwd(), "output")), self.render_title.Value, "*.*"))]
530531
self.render_btn.Label = "Cancel"
531532
if selected_renderable_lines: renderable_lines = selected_renderable_lines
532533
self.render_total = len(renderable_lines)
533534
self.render_progress.Range = self.render_total
534535
self.render_progress.Value = 0
535536
self.render_progress.Show()
537+
self.last_renderable_lines = renderable_lines
536538
for l in renderable_lines:
537539
if not self.render_total: return # render canceled
538540
self.audiospeak(l[1], render_filename = l[0])
@@ -543,7 +545,7 @@ def on_render_complete(self, canceled = False):
543545
if not canceled:
544546
title = self.render_title.Value
545547
if os.path.splitext(title)[1] in [".wav", ".mp3"]:
546-
items = [i for i in glob.glob(os.path.join(self.render_output_path, "*.wav"))]
548+
items = [os.path.join(self.render_output_path, i[0] + "." + self.speech_cache[i[1]]["extension"]) for i in self.last_renderable_lines]
547549
combined = AudioSegment(data = b"", sample_width = 2, frame_rate = 44100, channels = 1)
548550
for i in items:
549551
if len(combined) > 0: combined += AudioSegment.silent(config.as_int("render_consolidated_silence") if "render_consolidated_silence" in config else 200)

user/readme.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# STAR user client documentation
2-
The STAR user client is the frontend interface to this relay system. With it, you can connect to any coagulator you know about before synthesizing text into either audio that is played through speakers or rendered to audio files.
2+
The STAR user client is the frontend interface to this text to speech relay system. With it, you can connect to any coagulator you know about before synthesizing text into audio that can either be played through speakers or rendered to audio files.
33

44
If you are trying to learn how to host a coagulator so that your friends can share voices, then this [coagulator quickstart guide](https://github.com/samtupy/star/blob/main/coagulator/readme.md) on STAR's github will help you do that.
55

@@ -137,6 +137,22 @@ Each provider has a --configure command line option. So if you run balcony.exe -
137137
You can then run balcony.exe or sam.exe standalone and the voices will be shared using the set configuration. It's common to create shortcuts to the providers and place them in the shell:startup location accessible from the run dialog, causing voices to be shared to a list of coagulators on system boot.
138138

139139
## Change log
140+
### Revision 4
141+
This update to STAR contains all changes to the project that have taken place over the last 4+ months, including a slightly better visual UI, more providers, the coagulator web frontend, security/stability and bugfixes.
142+
* Improves the visual layout for the user client UI, it's still very likely quite far from perfect.
143+
* New providers in the STAR source package: bestspeech / Keynote Gold, openai, elevenlabs, and googlecloud.
144+
* Though it still needs work, at least somewhat improved the consolidated render feature. Now at least all the clips get rendered and in order too, though it's still a bit slow and has weird resampling.
145+
* The coagulator now provides an http frontend and API as a lightweight alternative to the STAR client.
146+
* Fixed a bug in the balcony provider which could cause text containing quotes to be output through speakers!
147+
* Major provider stability improvements, from the ability to specify maximum concurrent requests to vastly improved synthesis cancelation to general robustness including 10mb default max packet size. Before this update, providers would easily crash if too much text was fed to it. Now they handle that situation much more gracefully.
148+
* Fix bug in user client which was causing render complete noise to be played on synthesis error.
149+
* The STAR repository now includes a script which requests permission for macsay to be able to access and provide your MacOS personal voices!
150+
* STAR can now handle audio in formats other than wav when required. For example some cloud services actually offer the best sounding quality as mp3 or vorbis, and it would just be a waste of bandwidth to deceptively decode to wav before providing.
151+
* Implemented default pitch and rate functionality into the provider, sets macsay's default rate to 195wpm.
152+
* Minor provider code cleanup, including reducing very noisy error output when connections can't be established.
153+
* Fixed user client not reporting synthesis errors sent from a provider.
154+
* Fixed broken SAPI4 voice selection when a SAPI4 and SAPI5 voice existed with the same name.
155+
* Minor documentation updates including correcting a misdocumented keyboard shortcut.
140156
### Revision 3
141157
This is a major update to STAR which includes a complete user client rewrite and consequently the introduction of several useful features.
142158
* The user client was completely rewritten from scratch in python and WX Widgets, meaning that though feedback must still be gathered to make it look right or even to insure that controls are visible at all, the user client should soon be able to be used without a screen reader within a couple of revisions!

0 commit comments

Comments
 (0)