SuperTonic Voice Mixer

voice-mixer.py is a small PyQt5 tool to explore, mix, and modify existing Voice Style JSON files shipped with SuperTonic TTS. It lets you visually edit the style_ttl latent space, apply DSP-style transforms, remix several voices, and listen to the result immediately via the built-in TTS preview.

What this project does

Loads an existing SuperTonic voice style JSON (style_ttl / style_dp).
Visualizes the timbre latent (style_ttl) as an interactive heatmap.
Applies geometric and DSP operations to reshape the latent space.
Mixes multiple existing voices into a single “remix” style.
Generates speech with the current style and plays it directly.
Saves the modified style back to a SuperTonic-compatible JSON file.

Installation

You need Python 3 and these packages (install via pip):

pip install numpy sounddevice matplotlib PyQt5 onnxruntime

Option 1 – Use inside an existing SuperTonic installation

Install SuperTonic TTS from its official repository.
Copy voice-mixer.py into the same directory that contains SuperTonic’s py/ folder.
Ensure the following paths exist relative to voice-mixer.py:
- assets/onnx/ – ONNX models and config files from the official SuperTonic repo (duration_predictor, text_encoder, vector_estimator, vocoder, tts.json, unicode_indexer.json).
- assets/voice_styles/ – folder with your existing voice style JSON files.

Option 2 – Clone this repo and add assets

Clone this repository.
Copy supertone/py/assets/ from the SuperTonic repo into this folder so you end up with assets/onnx and assets/voice_styles.
Alternatively, download assets.zip from the releases of this repo and unzip it next to voice-mixer.py:
- assets.zip : assets.zip

Expected folder structure

For the program to run, your directory must look like this:

your-folder/
│
├── voice-mixer.py
├── helper.py
│
└── assets/
    ├── config.json
    ├── LICENSE
    ├── README.md
    ├── onnx/
    │   ├── duration_predictor.onnx
    │   ├── text_encoder.onnx
    │   ├── vector_estimator.onnx
    │   ├── vocoder.onnx
    │   ├── tts.json
    │   ├── tts.yml
    │   └── unicode_indexer.json
    │
    └── voice_styles/
        ├── F1.json
        ├── F2.json
        ├── M1.json
        └── M2.json

Running

From the folder containing voice-mixer.py:

python voice-mixer.py

The app will look for the ONNX models in assets/onnx.

GUI overview

Header bar

Load Voice JSON Opens a single voice style JSON (e.g. assets/voice_styles/*.json). Loads both style_ttl and style_dp. The style_ttl is shown in the heatmap; style_dp is kept as the reference duration/prosody style.
Save Voice JSON Saves the current edited style_ttl plus the current style_dp into a new SuperTonic-compatible JSON file. The output keeps the expected dimensions ([1, 50, 256] for style_ttl, [1, 8, 16] for style_dp) and adds simple metadata.
Reset to Original Restores the style_ttl heatmap to the originally loaded (or remixed) style.
Filename label Shows the currently active voice file (or “Remix (N voices)” after a mix).

Multi-Voice Mixer

Load Library (2+ files) Select multiple JSON voice styles at once. Each style is stored internally for mixing.
Remix Library Creates a new style by taking a random convex combination of all loaded voices (weights sum to 1). The result becomes the current “original” style displayed in the heatmap and can be further edited or saved as a new JSON.
Status label Shows how many voices are loaded and whether remixing is available.

Timbre Heatmap (Style TTL)

This large panel shows style_ttl as a 2D matrix (time vs features).

Mouse interaction
- Left-click: shift columns (features) to the right (circular roll).
- Right-click: shift rows (time/tokens) down (circular roll).

The title displays the current shape, and axis labels remind you of the click actions.

Latent Operations

All operations act on the currently visible style_ttl matrix.

Row 1 – Geometric / calculus

Mirror X Flip left–right (feature axis).
Mirror Y Flip top–bottom (time axis).
Invert Sign Multiply all values by −1.
Derivative Apply a gradient along the time axis to highlight changes.
Rand Shift Randomly rolls the matrix along time and feature axes, simulating random click-drift in both directions.

Row 2 – DSP-style operations

These are creative, experimental transforms on the latent space:

Sharpen Adds a derivative along features to accentuate “edges” in the latent pattern.
Quantize Rounds values to a coarse grid (bit-crush-like effect).
Echo Adds a delayed, attenuated copy along the feature axis (spectral smear).
Tremolo Applies a sinusoidal amplitude modulation across features (spectral ripple).
Jitter Multiplies each cell by a random factor in a small range.

Row 3 – Scalar math and stats

Val + Add Adds a scalar offset to all cells (use small values, e.g. 0.01–0.1).
Factor + Multiply Scales all cells by a scalar factor.
Min / Max label Shows the current minimum and maximum of the matrix, useful to keep the style values within a reasonable range.

Inference Settings

Speed slider Controls speech speed by scaling predicted durations. Values above 1.0 make speech faster/shorter; below 1.0 slower/longer.
Steps spinbox Number of refinement steps for the latent diffusion process (1–50). Higher values generally improve quality but take longer.

Text and Playback

Text box Enter the text you want to synthesize. Multi-sentence input is supported; long text is internally chunked.
Generate & Play Runs the TTS pipeline using the current style_ttl and style_dp, then plays the audio with sounddevice. CPU inference is used by default.

Saving and using new styles in SuperTonic

When you click Save Voice JSON, the tool writes a standard SuperTonic voice style JSON. You can:

Place the saved file into SuperTonic’s assets/voice_styles/ folder (or wherever your installation expects style JSONs).
Configure SuperTonic to use that style as you would any of the built-in ones.

This lets you start from existing voices and iteratively sculpt new variants, all while staying compatible with the SuperTonic TTS pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SuperTonic Voice Mixer

What this project does

Installation

Option 1 – Use inside an existing SuperTonic installation

Option 2 – Clone this repo and add assets

Expected folder structure

Running

GUI overview

Header bar

Multi-Voice Mixer

Timbre Heatmap (Style TTL)

Latent Operations

Row 1 – Geometric / calculus

Row 2 – DSP-style operations

Row 3 – Scalar math and stats

Inference Settings

Text and Playback

Saving and using new styles in SuperTonic

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
helper.py		helper.py
voice-mixer.py		voice-mixer.py

License

Topping1/Supertonic-Voice-Mixer

Folders and files

Latest commit

History

Repository files navigation

SuperTonic Voice Mixer

What this project does

Installation

Option 1 – Use inside an existing SuperTonic installation

Option 2 – Clone this repo and add assets

Expected folder structure

Running

GUI overview

Header bar

Multi-Voice Mixer

Timbre Heatmap (Style TTL)

Latent Operations

Row 1 – Geometric / calculus

Row 2 – DSP-style operations

Row 3 – Scalar math and stats

Inference Settings

Text and Playback

Saving and using new styles in SuperTonic

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages