This document provides API reference for TTS Studio's Python API and hexagonal architecture.
TTS Studio follows hexagonal architecture (Ports & Adapters) with clear separation of concerns:
- API Layer (
api/) - Main entry point for external consumers - Application Layer (
app/) - Use cases and orchestration - Domain Layer (
domain/) - Pure business logic - Infrastructure Layer (
infra/) - Adapters for external systems
from api.studio import TTSStudio
# Initialize the API
studio = TTSStudio()
# Create a voice profile
profile = studio.create_voice_profile(
name="my_voice",
sample_paths=["sample1.wav", "sample2.wav"]
)
# Generate audio
result = studio.generate_audio(
profile_id=profile["profile"]["id"],
text="Hello, world!"
)Audio processing utilities for loading, normalizing, and converting audio files.
Load an audio file and return as numpy array.
Parameters:
file_path(str): Path to audio file (WAV, MP3, FLAC)sample_rate(int, optional): Target sample rate in Hz. Default: 22050
Returns:
np.ndarray: Audio data as 1D numpy array (mono)
Raises:
FileNotFoundError: If audio file doesn't existValueError: If audio file format is unsupportedRuntimeError: If audio loading fails
Example:
from voice_clone.audio import load_audio
audio = load_audio("sample.wav", sample_rate=22050)
print(f"Audio shape: {audio.shape}")
print(f"Duration: {len(audio) / 22050:.2f} seconds")Normalize audio to target dB level.
Parameters:
audio(np.ndarray): Input audio arraytarget_level(float, optional): Target level in dB. Default: -20.0
Returns:
np.ndarray: Normalized audio array
Example:
from voice_clone.audio import load_audio, normalize_audio
audio = load_audio("sample.wav")
normalized = normalize_audio(audio, target_level=-20.0)Convert stereo audio to mono by averaging channels.
Parameters:
audio(np.ndarray): Input audio (1D for mono, 2D for stereo)
Returns:
np.ndarray: Mono audio as 1D array
Example:
from voice_clone.audio import load_audio, convert_to_mono
audio = load_audio("stereo.wav")
mono = convert_to_mono(audio)Save audio array to file.
Parameters:
audio(np.ndarray): Audio data to savefile_path(str): Output file pathsample_rate(int, optional): Sample rate in Hz. Default: 22050
Returns: None
Raises:
ValueError: If audio data is invalidIOError: If file cannot be written
Example:
from voice_clone.audio import save_audio
import numpy as np
audio = np.random.randn(22050) # 1 second of noise
save_audio(audio, "output.wav", sample_rate=22050)Calculate audio duration in seconds.
Parameters:
audio(np.ndarray): Audio datasample_rate(int): Sample rate in Hz
Returns:
float: Duration in seconds
Example:
from voice_clone.audio import load_audio, get_audio_duration
audio = load_audio("sample.wav", sample_rate=22050)
duration = get_audio_duration(audio, 22050)
print(f"Duration: {duration:.2f} seconds")Model management and loading for Qwen3-TTS.
Represents a trained voice model.
Attributes:
model_path(str): Path to model directorylanguage(str): Model language codesample_rate(int): Model sample rate
Methods:
Initialize voice model.
Parameters:
model_path(str): Path to model directorylanguage(str, optional): Language code. Default: "en"
Example:
from voice_clone.model import VoiceModel
model = VoiceModel("data/models/my_voice", language="en")Load model into memory.
Returns: None
Raises:
FileNotFoundError: If model files don't existRuntimeError: If model loading fails
Example:
model = VoiceModel("data/models/my_voice")
model.load()Check if model is loaded.
Returns:
bool: True if model is loaded, False otherwise
Example:
if not model.is_loaded():
model.load()Train a new voice model from audio samples.
Parameters:
samples_dir(str): Directory containing audio samplesoutput_path(str): Output path for trained modellanguage(str, optional): Language code. Default: "en"**kwargs: Additional training parameters
Returns:
VoiceModel: Trained voice model instance
Raises:
FileNotFoundError: If samples directory doesn't existValueError: If insufficient samples providedRuntimeError: If training fails
Example:
from voice_clone.model import train_model
model = train_model(
samples_dir="data/samples/my_voice",
output_path="data/models/my_voice",
language="en",
min_duration=1.0,
max_duration=10.0
)Load an existing voice model.
Parameters:
model_path(str): Path to model directorylanguage(str, optional): Language code. Default: "en"
Returns:
VoiceModel: Loaded voice model instance
Raises:
FileNotFoundError: If model doesn't existRuntimeError: If model loading fails
Example:
from voice_clone.model import load_model
model = load_model("data/models/my_voice", language="en")Text-to-speech synthesis functionality.
Text-to-speech synthesizer using Qwen3-TTS.
Attributes:
model(VoiceModel): Voice model to usesample_rate(int): Output sample rate (12000 Hz for Qwen3-TTS)temperature(float): Synthesis temperaturespeed(float): Speech speed multiplier
Methods:
Initialize synthesizer with voice model.
Parameters:
model(VoiceModel): Voice model instancesample_rate(int, optional): Output sample rate. Default: 22050
Example:
from voice_clone.model import load_model
from voice_clone.synthesizer import Synthesizer
model = load_model("data/models/my_voice")
synthesizer = Synthesizer(model, sample_rate=22050)Synthesize speech from text.
Parameters:
text(str): Text to synthesizetemperature(float, optional): Synthesis temperature (0.1-1.0). Default: 0.7speed(float, optional): Speech speed multiplier (0.5-2.0). Default: 1.0
Returns:
np.ndarray: Synthesized audio as numpy array
Raises:
ValueError: If parameters are out of rangeRuntimeError: If synthesis fails
Example:
synthesizer = Synthesizer(model)
audio = synthesizer.synthesize(
text="Hello, world!",
temperature=0.7,
speed=1.0
)synthesize_to_file(text: str, output_path: str, temperature: float = 0.7, speed: float = 1.0) -> None
Synthesize speech and save to file.
Parameters:
text(str): Text to synthesizeoutput_path(str): Output file pathtemperature(float, optional): Synthesis temperature. Default: 0.7speed(float, optional): Speech speed multiplier. Default: 1.0
Returns: None
Example:
synthesizer.synthesize_to_file(
text="Hello, world!",
output_path="output.wav",
temperature=0.7,
speed=1.0
)Convenience function to synthesize text with a model.
Parameters:
model(VoiceModel): Voice model to usetext(str): Text to synthesize**kwargs: Additional synthesis parameters
Returns:
np.ndarray: Synthesized audio
Example:
from voice_clone.model import load_model
from voice_clone.synthesizer import synthesize_text
model = load_model("data/models/my_voice")
audio = synthesize_text(model, "Hello, world!", temperature=0.7)Configuration management for Voice Clone CLI.
Application configuration.
Attributes:
model_path(str): Default model pathoutput_dir(str): Default output directorysample_rate(int): Default sample ratelanguage(str): Default languagetemperature(float): Default temperaturespeed(float): Default speed
Methods:
Load configuration from YAML file.
Parameters:
config_path(str): Path to config file
Returns:
Config: Configuration instance
Example:
from voice_clone.config import Config
config = Config.load_from_file("config/config.yaml")from typing import Union, Optional, List
import numpy as np
from pathlib import Path
# Audio data type
AudioArray = np.ndarray
# File path type
FilePath = Union[str, Path]
# Language code type
LanguageCode = str # ISO 639-1 codes: "en", "es", "fr", etc.
# Sample rate type
SampleRate = int # Hz, typically 16000, 22050, or 44100Base exception for Voice Clone CLI errors.
Raised when model files cannot be found.
Raised when audio processing fails.
Raised when speech synthesis fails.
Example:
from voice_clone.exceptions import ModelNotFoundError
try:
model = load_model("nonexistent/path")
except ModelNotFoundError as e:
print(f"Model not found: {e}")from voice_clone.model import train_model, load_model
from voice_clone.synthesizer import Synthesizer
from voice_clone.audio import save_audio
# Train a model
model = train_model(
samples_dir="data/samples/my_voice",
output_path="data/models/my_voice",
language="en"
)
# Or load existing model
model = load_model("data/models/my_voice")
# Create synthesizer
synthesizer = Synthesizer(model, sample_rate=22050)
# Synthesize speech
audio = synthesizer.synthesize(
text="Hello, this is a test of voice cloning.",
temperature=0.7,
speed=1.0
)
# Save to file
save_audio(audio, "output.wav", sample_rate=22050)from voice_clone.model import load_model
from voice_clone.synthesizer import Synthesizer
model = load_model("data/models/my_voice")
synthesizer = Synthesizer(model)
texts = [
"First sentence to synthesize.",
"Second sentence to synthesize.",
"Third sentence to synthesize."
]
for i, text in enumerate(texts):
synthesizer.synthesize_to_file(
text=text,
output_path=f"output_{i}.wav"
)Voice Clone CLI automatically uses GPU if available:
import torch
# Check GPU availability
if torch.cuda.is_available():
print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
print("Using CPU")For large batch processing:
# Process in smaller batches
batch_size = 10
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
# Process batch
torch.cuda.empty_cache() # Clear GPU memoryTo add new API functionality:
- Add function/class to appropriate module
- Include type hints for all parameters and returns
- Write comprehensive docstrings
- Add unit tests
- Update this API documentation
- Run
make pre-committo validate
- Usage Guide - CLI usage examples
- Development Guide - Development setup
- Installation Guide - Installation instructions