🎙️ AI Voice Studio (Advanced Multi-Engine Suite)

AI Voice Studio is a professional-grade audio production environment designed for high-fidelity documentary narration. It bridges the gap between mechanical Text-to-Speech and genuine human-like performance by connecting your production pipeline directly to frontier AI models.

Download on the Releases Page Download Releases Or TRY THE APP HOSTED ON GOOGLE CLOUD

🚀 Key Features

Multi-Engine Synthesis: Seamlessly switch between Gemini 2.5/3.0, ElevenLabs Multilingual v2, and Resemble AI.
Universal Translation Layer: Utilizes Gemini 3 Pro to translate your script while preserving SSML tags and emotional cues before synthesis.
Vocal Identity Cloning: Create instruction-based voice profiles. Analyze your own voice samples to generate "Identity instruction" sets for the Gemini TTS engine.
Long-Form Production: Automatic script chunking for narrations exceeding 10,000+ characters with gapless concatenation.
Pronunciation Dictionary: Define custom rules for technical terms, names, or localized jargon.
Store-Ready Architecture: Full support for Mac App Store (MAS) Sandboxing, Microsoft AppX, and Linux Flatpak distribution.

🌍 Supported Languages (Hybrid Registry)

The Universal Translation Layer and multi-engine synthesis now support the following 41 languages and classical/indigenous dialects:

Arabic (AR)
Aymara
Catalan
Cherokee
Chinese (ZH)
Danish
Dutch (NL)
English (UK & US)
Flemish
French (FR)
Georgian
German (DE)
Guarani
Hindi (IN)
Indonesian (ID)
Italian (IT)
Japanese (JP)
Kannada
Khmer
Korean (KR)
Lao
Latin
Latvian
Maltese
Maya (Yucatec)
Nahuatl
Navajo
Nepali
Norwegian
Polish (PL)
Portuguese (BR)
Quechua
Russian (RU)
Sinhala
Spanish (ES)
Swedish (SE)
Tamil
Thai (TH)
Turkish (TR)
Vietnamese (VN)

Narrator Creator Academy help Section

🛠️ How it Works

1. Neural Identity Analysis

The "Custom Voice" feature uses a Zero-Shot Analysis technique. When you provide a sample, the Gemini model decomposes the audio into a parametric text-based instruction set (Pitch, Tone, Pace, Prosody). This instruction is then fed as a system prompt to the TTS engine to modulate the base voice into your "Identity."

2. Hybrid Translation Pipeline

When narrating in a non-English language:

Source Script is passed to Gemini 3 Pro.
SSML-Aware Translation is performed, ensuring <break> and [Whisper] tags remain in the correct semantic position.
Target Language Audio is synthesized by your selected engine (ElevenLabs, Gemini, or Resemble).

🏗️ Build & Publish

# 1. Install Production Dependencies
npm install

# 2. Compile Web Application
npm run build

# 3. Package for Distribution (Windows/Mac/Linux)
npm run electron:package

"Perform your narrative, don't just generate it."
Developed by Ajarn Spencer Littlewood

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
assets		assets
components		components
utils		utils
.gitignore		.gitignore
App.tsx		App.tsx
README.md		README.md
electron-builder.yml		electron-builder.yml
electron.js		electron.js
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
preload.js		preload.js
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ AI Voice Studio (Advanced Multi-Engine Suite)

Download on the Releases Page Download Releases Or TRY THE APP HOSTED ON GOOGLE CLOUD

🚀 Key Features