GitHub - Camb-ai/MAMBA-BENCHMARK: Data used in evaluation of MARS8 TTS

MARS8: The world's first family of TTS models

Overview

MARS8 achieves state-of-the-art speech quality and speaker similarity in text-to-speech synthesis, excelling in challenging real-world voice cloning scenarios with minimal reference audio.

Evaluated head-to-head against leading TTS systems — Cartesia Sonic-3, ElevenLabs Multilingual v2/v3, and Minimax Speech-2.6-HD — MARS8 delivers top-tier results across all key metrics while maintaining exceptional voice fidelity from references as short as 2 seconds.

Performance

View detailed metrics

Speech Quality

Metric	MARS8-Pro	MARS8-Flash	Sonic-3	Speech-2.6-HD	Multilingual v2	Multilingual v3
CE ↑	5.43	5.43	5.04	4.99	5.41	5.18
PQ ↑	7.45	7.45	6.95	6.95	7.45	7.19

Voice Cloning Accuracy

Metric	MARS8-Pro	MARS8-Flash	Sonic-3	Speech-2.6-HD	Multilingual v2	Multilingual v3
CER ↓	5.77%	5.67%	8.54%	11.30%	4.39%	14.62%

Speaker Similarity

Metric	MARS8-Pro	MARS8-Flash	Sonic-3	Speech-2.6-HD	Multilingual v2	Multilingual v3
Wavlm-base-sv (cosine similarity) ↑	0.8676	0.8666	0.8420	0.8666	0.8109	0.8253
CAM++ embedding (cosine similarity) ↑	0.7097	0.7066	0.5134	0.5878	0.3912	0.336

Key finding: MARS8 achieves state-of-the-art speaker similarity even with audio references as short as 2 seconds — a critical advantage for real-world applications where long, clean reference audio is rarely available.

Benchmark

MAMBA: The "Kobe Bryant" of TTS Benchmarks

To validate these claims, we developed the MAMBA Benchmark, a rigorous stress test designed to reflect the most demanding real-world conditions, rather than idealized studio environments.

The name MAMBA is intentional. Our team at CAMB deeply resonates with the mamba mentality: a relentless commitment to excellence, discipline, and continuous improvement. Kobe Bryant's legacy stands as a powerful testament to what sustained hard work and focus can achieve, even when starting as an underdog. In the same spirit, the MAMBA Benchmark embodies difficulty by design, prioritizing the hardest cases, not the easiest ones.

Today, we are open-sourcing the MAMBA Benchmark so the broader community can independently replicate and validate our results. Our goal is for MAMBA to serve not only as a transparent validation framework for our own models, but also as a durable, industry-grade benchmark against which future TTS systems can be evaluated.

Statistic	Value
Total samples	1,334
Cross-language pairs	70%
Average reference duration	2.3s
Most common reference length	2.0s
Total source audio	101 min
Speech-only segments	51 min

Why MAMBA?

Traditional TTS benchmarks rely on clean, long-form reference audio in controlled conditions. MAMBA challenges this by introducing:

Cross-language voice cloning — 70% of samples require cloning across different languages, testing pronunciation robustness and identity preservation
Ultra-short references — Average reference duration of just 2.3 seconds mirrors real-world constraints
Expressive source audio — References contain natural expressiveness rather than neutral read speech

Getting Started

Prerequisites

Create a Camb.ai account — Sign up at camb.ai and generate an API key from your dashboard.

Install FFmpeg

# Ubuntu/Debian
apt update && apt install -y ffmpeg

# macOS
brew update && brew install ffmpeg

# Windows
winget install ffmpeg

Set up Python environment

python3 -m venv venv
source ./venv/bin/activate  # Linux/macOS
# or
.\venv\Scripts\activate     # Windows

Installation

pip install -r requirements.txt

Running the Benchmark

Step 1: Load audio data

python load_audio.py

This downloads and extracts audio from the sources defined in teasers.json.

Step 2: Clean and segment audio

python load_segments.py

This removes background noise using UVR-MDX-NET and splits audio into segments based on subtitle timing. Cleaned segments are saved to ./segments/.

Results

MARS8 demonstrates consistent superiority across the evaluation dimensions that matter most for production deployments:

Capability	MARS8 Advantage
Minimal reference requirements	High-fidelity cloning from 2s audio
Cross-language robustness	Strong performance on 70% cross-lingual test set
Pronunciation accuracy	5.67% CER on multilingual content
Speaker identity preservation	0.87 Wavlm-base-sv / 0.71 CAM++ embedding similarity scores

Methodology

All evaluations follow standardized protocols to ensure reproducibility:

Metric	Method
Speaker similarity	Wavlm-base-sv (cosine similarity) and CAM++ embedding (cosine similarity) speaker verification models
Transcription accuracy	Character Error Rate (CER) via Whisper ASR
Quality assessment	CE and PQ scores via Facebook audio-aesthetics model

The evaluation data, cleaning pipeline, and metric definitions are fully open-sourced.

References

#	System	Link
1	Cartesia Sonic-3	cartesia.ai
2	ElevenLabs Multilingual v2/v3	elevenlabs.io
3	Minimax Speech-2.6-HD	minimax.io

Citation

If you use this benchmark in your research, please cite:

@misc{mars8_2026,
  title   = {MARS8: State-of-the-art Text-to-Speech with Minimal Reference Audio},
  author  = {Camb.ai},
  year    = {2026},
  note    = {Evaluated on the MAMBA Benchmark},
  url     = {https://github.com/Camb-ai/MAMBA-BENCHMARK}
}

License

This project is licensed under the MIT License — see LICENSE for details.

camb.ai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Performance

Speech Quality

Voice Cloning Accuracy

Speaker Similarity

Benchmark

MAMBA: The "Kobe Bryant" of TTS Benchmarks

Why MAMBA?

Getting Started

Prerequisites

Installation

Running the Benchmark

Results

Methodology

References

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data		data
LICENSE		LICENSE
README.md		README.md
load_audio.py		load_audio.py
load_segments.py		load_segments.py
requirements.txt		requirements.txt
teasers.json		teasers.json

License

Camb-ai/MAMBA-BENCHMARK

Folders and files

Latest commit

History

Repository files navigation

Overview

Performance

Speech Quality

Voice Cloning Accuracy

Speaker Similarity

Benchmark

MAMBA: The "Kobe Bryant" of TTS Benchmarks

Why MAMBA?

Getting Started

Prerequisites

Installation

Running the Benchmark

Results

Methodology

References

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages